Two Snake Genomes Equals a Good Reading Day

It’s a good week for snake genomics, because PNAS has published both the Burmese python genome (Castoe et al. 2013) and the king cobra genome (Vonk et al. 2013). The related papers come from separate research teams (the python people mostly from Colorado and the cobra cabal the Netherlands), albeit with significant overlap between them. The world of snake molecular biology is a small one, after all.

The python group planted its snake genome flag in the ground more than two years ago with a paper describing their first draft assembly, and I have been eagerly awaiting the  results of their full-blown analysis. It was well worth the wait. As the python is known for its feast-or-famine metabolism (the small intestine can grow up to three times in size after gulping a meal half the python’s body mass), the researchers provided a very elegant analysis of the differential expression of genes in digestive organs before and after a meal and show, basically, that different genes are turned on or off either pre- or post-feeding. Very cool.

Figure 3B from Castoe et al. (2013), showing genes that have undergone positive selection on the vertebrate lineage leading to snakes.

Figure 3B from Castoe et al. (2013), showing genes that have undergone positive selection on the vertebrate lineage leading to snakes.

The authors of the python paper also investigated how snakes evolved their iconic and constrained morphology – because, after all, snakes lack legs and are equipped with a feeding apparatus equivalent to a human being swallowing a 16lb Thanksgiving turkey whole, and the molecular bases of these adaptations are unresolved. They analyzed a large number of genes shared across vertebrates – called orthologs – and found that during the evolution of the vertebrate lineage leading to snakes, many genes associated with skull and spinal development, metabolism and other functions experienced a faster rate of evolution than in other lineages. Also very cool.

Next up: king cobra. Why do we need both a python AND a cobra genome? The answer lies in the difference between these two snakes. One of them – and I hope you guessed cobra – is venomous. So it is no surprise that much of the justification for sequencing the king cobra genome included a need to understand the evolutionary origins and maintenance of the genes that control venom production. The cobra genome contains a multitude of protein families that underwent a significant expansion during cobra evolution that resulted in what we see today – a highly potent mixture of toxins designed to ensure certain death to a chosen prey item.

The two genomes differ vastly in their qualities of assembly. Through a mixture of various sequencing methods, the python team was able to get an N50 value of 207kb, meaning 50% of the assembled chunks of contiguous sequence were at least 207,000 base pairs of DNA in length. That assures that the research team would be able to recover the majority of genes – exons and introns and all. The cobra genome by contrast has an N50 value of only 3,982 base pairs, meaning that some of the genes may be fragmentary and the length of many introns will remain unresolved. However, I think that using N50 as a the gold standard of genome assembly “quality” is misleading. Sequencing strategies that significantly raise N50 values cost more money. In this day and age of modern biology, where small labs or groups of researchers conjure up whatever resources they can for an in-house genome sequencing project, the most affordable strategy for however you wish to address your biological questions will probably suffice. Both of these snake genome papers make the cut in that regard, and they are a significant contribution to the field of reptilian genomics.

Lizard’s Junk DNA is Human Genome’s Treasure

Recently alighted by a post on the Anole Annals that the Anolis genome was taken on by the creationist website, I wondered how the Bible-as-absolute-truth crowd might distort and interpret the meaningful evolutionary insights provided by the lizard genome. The “News to Note” blog written by Dr. Elizabeth Mitchell browses notable science current events featured in mostly secondary media outlets (not the primary literature), providing critiques on conclusions made by researchers that either draw from or provide evidence for evolutionary principles (the entries primarily combat any references to an ancient earth or descent from a common ancestor). Following the link, I was pleased to find that this particular attack on evolutionary research featured a discussion of transposable elements (TEs) in the Anolis genome, the precise focus of my doctoral research!

The blog entry cited a Science Daily article about the release of the Anolis genome in which co-author J. Alföldi was quoted:

“Anoles have a living library of transposable elements,” said Alföldi. The researchers aligned these mobile elements to the human genome, and found that close to 100 of the human genome’s non-coding elements are derived from these jumping genes. “In anoles, these transposons are still hopping around, but evolution has used them for its own purposes, turning them into something functional in humans.”

With a reference to the utility of comparative studies in determining the origins of many aspects of the human genome, Alföldi concludes:

“Sometimes you need to be at a certain distance in order to learn about how the human genome evolved. You have to look out further than you were looking previously.”

Green anole mating (Anolis carolinensis), Caro...

Hey now! Transposable elements active in the green anole have been co-opted by mammalian genomes to provide important functions. Image via Wikipedia

This is certainly true about TEs, because they have been evolving for billions of years and are found in the genomes of all eukaryotes. Vertebrates in particular show a wide range of TE diversity and abundance. In the human genome, as in most mammals, hundreds of thousands of copies of one kind of TE, called L1, dominate the genomic landscape (see the first human genome paper Lander et al. 2001). The Anolis genome contains L1, like human, but also a much wider array of many other kinds of TEs, generally found in relatively low numbers, which is unlike human (reviewed in Tollis and Boissinot 2011).

The interesting aspect of human and lizard TE evolution referred to by Alföldi is that there are some some regions in the human genome that contain parts of TE sequences that are not L1, yet they are highly conserved, meaning that if you were to compare these sequences to their active counterparts in other genomes (like Anolis), they would be almost exactly the same. The last common ancestor of humans and reptiles lived more than 300 million years ago, so one would expect these sequences in their respective genomes would be extremely divergent.


The tuatara is a lepidosaurian reptile that is actually quite different from lizards. If something is found in the genome of the tuatara, green anole, and human, chances are it was present in the common ancestor of all of these species, which lived more than 300 million years ago. Image by SidPix via Flickr

The fact that they are conserved in mammals suggests that purifying selection has been acting on them — their sequence integrity is so important to the function of the host that any DNA mutations would be deleterious and therefore removed by selection. In fact, a paper published in Journal of Heredity (Lowe et al. 2010) showed evidence that one of these elements (also found in the tuatara) regulates the expression of a gene important to embryonic development. The results from the Anolis genome confirm that certain TEs, present in the amniote ancestor, faced starkly different fates as amniotes diversified, maintaining jumping ability throughout reptile evolution while losing it in mammals — where the TE motifs were “domesticated” and utilized by the host to increase its fitness.

This is a fascinating story that highlights the interplay between an organism, its genome, and the selfish genomic parasites which have, in turn, become taken advantage of. It is all made apparent when placed in a comparative — ie, evolutionary context. However, Dr. Mitchell prefers to prune this tree, because she essentially sees all evolutionary explanations as ad hoc at best:

“Any sort of similarity between the lizard and birds or mammals was interpreted as evidence of common ancestry. So was any difference. By assuming that evolution occurred, the researchers simply decorated the evolutionary tree with their data… The interpretation of the data in the shadow of the evolutionary tree of life is unjustified and unproven.”

Finally, she offers her explanation.

“Knowing that God designed all organisms to live in the same world…The fact that some things are similar and others are different does not show that reptiles, mammals, and birds share a common ancestor.”

So, according to creationists, scientists are guilty of making unwarranted assumptions when well-tested hypotheses explain reproducible results. In the meantime, creation-based explanations are completely assumption-free because she just knows that “God designed all organisms”.

Well, if you know, you know.

Can’t really argue with that, I guess.

The Age of Reptiles

On the eve of the publication of the Anolis genome paper in the journal Nature, it is an exciting time to study reptile evolution. Before 2005, there were no reptile genomes available. Anolis was chosen to be the first genomic representative of an extremely diverse class of vertebrates, an account of this process is provided by Harvard University professor Jonathan Losos on his Anolis-themed blog Anole Annals.

Even though the paper will be published in 2011, the Anolis genome has been available since 2007 (it must be exceedingly difficult to write a paper with a few score plus authors), and a variety of researchers pounced on the chance to study the first reptile sequence. For instance, it has been observed that the Anolis genome lacks isochores which are common in other vertebrates (Fujita et al. 2011). We published a paper reviewing the impact of transposable elements (TEs) in the Anolis genome (Tollis and Boissinot 2011), which I synopsized in a recent post on Anole Annals.

In our paper, we discuss Castoe et al. 2011 which describes TEs found in two snake genomes, Burmese python and copperhead. There is currently a python genome project as well, and the first draft of this genome is already available. The python belongs to an ancient group of snakes and is well studied due to its ability to withstand extreme metabolic shifts as a result of the serpentine feast and famine lifestyle.

The Ed Green lab and the Genome Technology Center at UC Santa Cruz is working on the alligator genome. Alligators and crocodiles are the closest living relatives of birds, and together they form a group called archosaurs. This was a successful and diverse group during the Mesozoic Era, and extinct members include all non-avian dinosaurs and pterosaurs. The alligator genome will teach us more about this important branch of the vertebrate family tree. In addition, an alligator genome will have value to human health because of their incredibly complex and robust immune systems (they live in rank swamps and rarely get infections). More crocodilians are being sequenced as well, and their genome dynamics are an active area of research (see David Ray’s lab page at Mississippi State).

Python versus Alligator! Python already has his genome sequence available on NCBI so he wins the current battle... but who will win the war? (photo from Wikipedia)

There are several more reptiles that are either being sequenced currently or are being considered for sequencing. The painted turtle will be the first large genome (3Gb) to be sequenced fully with next-generation 454 technology. There is a proposal in place to sequence the garter snake genome as well. It won’t be long before we find ourselves in a new, genomic Age of Reptiles.

Annotating a Genome at Queens College

This year, I taught Genomics Research II at Queens College. This was the second semester of an exciting two-part course sponsored by the Howard Hughes Medical Institute’s Science Education Alliance (or SEA). In the first semester, students collected soil samples and learned the laboratory protocols for isolating mycobacteriophage – tiny viruses that only infect Mycobacterium (tuberculosis is a species of this kind of bacteria). The students then practiced isolating the DNA of each mycobacteriophage. One of these, MeeZee, collected by Barakah Nausrudeen, was selected to be sequenced.

Computational analyses and student annotations were visualized using the SEA workflow

In the second semester, I made the students familiar with the basics of Central Dogma in molecular biology (DNA makes RNA makes proteins) and how to use bioinformatic tools to annotate the freshly sequenced MeeZee genome. We received a FASTA file of the genome, which was basically a text file made of ~54,000 As, Ts, Cs, and Gs in sequence. We had to figure out where each gene was in the genome, as well as the function of each gene. One thing that made this easier than it sounds is that the SEA has set up a terrific bioinformatics workflow that consolidates various computational analyses such as Glimmer and GeneMark for gene calling; Aragorn and tRNAscan for, well, tRNAs; SD Finder for Shine-Dalgarno sites (the mRNA upstream of a gene where ribosomal binding occurs); and the coding potentials on each reading frame.

With the Apollo genome annotation software, students could weigh evidence collected with the workflow to call genes. Their genes calls were BLASTed to GenBank in order to confirm homology to other known mycobacteriophage genes. I had the students work in groups and present their findings to the class, which often led to lively and informed debate over any discrepancies between the conclusions of different groups. Information about MeeZee can be found on the SEA-maintained database

The bio-computing lab at Queens College, where students annotated the MeeZee genome

Although the actual genome annotation is fun and teaches the students a lot about biology, perhaps the best parts of this experience for all involved are (1) the complete annotated MeeZee genome has been submitted to GenBank and all the students are coauthors; (2) the Graham Hatfull lab at University of Pittsburgh will include us as coauthors on a large scale comparative phage genomics paper to be submitted soon; and (3) one of the students came with me to the 3rd Annual SEA Symposium, where she presented our research to an audience of other phage biologists. As most of these students were college freshmen who had never taken a higher-level science course, it was truly a first-class opportunity to engage in all levels of scientific work: Collection, Observation, Discovery, and Communication.