Annotating a Genome at Queens CollegePosted: July 13, 2011
This year, I taught Genomics Research II at Queens College. This was the second semester of an exciting two-part course sponsored by the Howard Hughes Medical Institute’s Science Education Alliance (or SEA). In the first semester, students collected soil samples and learned the laboratory protocols for isolating mycobacteriophage – tiny viruses that only infect Mycobacterium (tuberculosis is a species of this kind of bacteria). The students then practiced isolating the DNA of each mycobacteriophage. One of these, MeeZee, collected by Barakah Nausrudeen, was selected to be sequenced.
In the second semester, I made the students familiar with the basics of Central Dogma in molecular biology (DNA makes RNA makes proteins) and how to use bioinformatic tools to annotate the freshly sequenced MeeZee genome. We received a FASTA file of the genome, which was basically a text file made of ~54,000 As, Ts, Cs, and Gs in sequence. We had to figure out where each gene was in the genome, as well as the function of each gene. One thing that made this easier than it sounds is that the SEA has set up a terrific bioinformatics workflow that consolidates various computational analyses such as Glimmer and GeneMark for gene calling; Aragorn and tRNAscan for, well, tRNAs; SD Finder for Shine-Dalgarno sites (the mRNA upstream of a gene where ribosomal binding occurs); and the coding potentials on each reading frame.
With the Apollo genome annotation software, students could weigh evidence collected with the workflow to call genes. Their genes calls were BLASTed to GenBank in order to confirm homology to other known mycobacteriophage genes. I had the students work in groups and present their findings to the class, which often led to lively and informed debate over any discrepancies between the conclusions of different groups. Information about MeeZee can be found on the SEA-maintained database phagesdb.org.
Although the actual genome annotation is fun and teaches the students a lot about biology, perhaps the best parts of this experience for all involved are (1) the complete annotated MeeZee genome has been submitted to GenBank and all the students are coauthors; (2) the Graham Hatfull lab at University of Pittsburgh will include us as coauthors on a large scale comparative phage genomics paper to be submitted soon; and (3) one of the students came with me to the 3rd Annual SEA Symposium, where she presented our research to an audience of other phage biologists. As most of these students were college freshmen who had never taken a higher-level science course, it was truly a first-class opportunity to engage in all levels of scientific work: Collection, Observation, Discovery, and Communication.