Browsing the Lizard GenomePosted: August 12, 2011
The latest version of the Anolis genome has been made publicly available, and can be accessed and explored with genome browsers by Ensembl, NCBI, and the University of California Santa Cruz (UCSC). I mostly use the UCSC Lizard Genome Browser Gateway for my research, although I often go cross-platform with genome browsers depending on what I want to do.
Anolis carolinensis is one of the only genomic model species that is widespread and abundant in the USA. My research in anole population genetics greatly benefits from this in two ways. First, I can obtain a large and geographically varied sample without a passport. The second reason is that, unlike most species for which there is very little known about the genome, the Anolis genome database provides an excellent resource for gaining access to information about genetic variation.
If you want to study the function of a particular gene, or determine frequency of the gene in a population, you have to know the gene’s DNA sequence. The lizard genome is still largely unexplored and the exact DNA sequence of every gene is not yet known. Fortunately, the UCSC browser contains gene prediction tracks that can search large sequence databases and conduct comparative protein, cDNA and mRNA alignments. I try to weigh the results of different gene predictions to make informed decisions about the Anolis genome.
Using computers to study biology is a type of research called in silico, or “by computer/simulation”, and it is just one part of my research. There is a lot of work that happens before (and after) the in silico part, including:
1) Catching lizards (fun!!!).
2) Getting tissue from the lizard (not for the squeamish).
3) Extracting DNA from that tissue (with an enzyme called proteinase K that digests the tissue).
And here is where in silico comes in:
4) Design PCR experiments with the goal of amplifying special regions in the genome I wish to study.
If you aren’t familiar with PCR, here is an interactive PCR tutorial from the University of Utah that nicely illustrates the laboratory work involved in this procedure. I use PCR to amplify the same genetic locus across an entire population (this is called a marker). I want to study mutations in DNA sequence markers that might differ across subpopulations.
Here are some of the markers I design using the UCSC Genome Browser:
Exon-Primed Intron Crossing (EPIC)
Lizards are eukaryotes, and so their genes contain both exons and introns. Exons contain the information necessary for proteins, so they are highly conserved between individuals of the same species and probably not suitable for population genetic analysis. Introns, however, are spliced out during protein formation and evolve freely, accumulating mutations at a neutral rate. Designing PCR primers in the conserved exons with the goal of sequencing the intron between them is a good way to collect DNA sequence variation in a population.
Natural selection acts very strongly on the protein coding regions of genes, where mutations might have a deleterious effect on organismal function. The loss of variation in regions linked to or in close proximity to genes is called a selective sweep. If I want to study regions free of selective pressure, that can accumulate DNA sequence variation, I might want to look far away from genes. The vast majority of the Anolis genome contains DNA outside of genes, and generally speaking the farther you look from a gene, the weaker will be the strength of selection. The genome browser provides a convenient method of finding these gene-poor regions.
What exactly are these short, intergenic DNA sequences? I’m not quite sure. They have no predicted gene function, and if you do a BLAST (Basic Local Alignment Tool) search in GenBank there are no significant matches. They really don’t appear to actually be anything. They might be parts of highly mutated and unrecognizable remnants of ancient gene duplications or transposable element insertions. The genomes of most eukaryotes (and at 2.2Gb in size, Anolis especially) are vast oceans of DNA, in which genes are rare oases of recognizable functionality.
It’s important to note that these bioinformatic techniques are useful for designing population genetic markers in the nuclear genome. A lot of population genetics focuses on DNA variation in the mitochondria, and I study this as well. The UCSC and other genome browsers provide such a good method of exploring the larger and more complicated nuclear genome that greatly increase the number of genetic regions accessible for study. Full genome sequencing projects thus have enormous potential for biologists interested in many evolutionary questions, such as the genetic basis of adaptation, the historical processes underlying the geographic distribution of individuals, and the origins and maintenance of genome size and structure.
- Lizard Genome Unveiled (scientificamerican.com)