Open Access Open Access  Restricted Access Subscription or Fee Access

20 Genome Annotation Past, Present, and Future: How to Define an Open Reading Frame at Each Locus

Michael R. Brent


The 10 years since Genome Research began publication bracket a complete era of genome research—an era of stunning successes and nagging loose ends, promise exceeded and promise as yet unfulfilled. The years 1996–2005 were characterized by tremendous optimism and productivity. In 1996, the sequencing of the human genome was scheduled to be completed in 2005 (Collins and Galas 1993). Driven by competition, automation, and technology, the genomics community far exceeded its own sequencing ambitions. But there was another goal that we have not yet reached—the genome was to provide a “parts list” for the human and other major model organisms. The parts turned out to be more varied than anticipated, and we have learned wonderful things about the biology and history encoded in genome sequences (Waterston et al. 2002; Gibbs et al. 2004). But the most fundamental parts on anyone’s list, then and now, must be the complete set of translated open reading frames (ORFs) and the exon–intron structures from which they are assembled. (I will use the term ORF to denote the complete exon–intron structure of the protein coding region of any mature mRNA. Thus, a primary transcript that is alternatively spliced may represent more than one ORF.) After sequencing (Lander et al. 2001; Venter et al. 2001), completing (Collins et al. 2003; The International Human Genome Sequencing Consortium 2003), and finishing (International Human Genome Sequencing Consortium 2004) the human genome, we do not have even one complete, correct ORF for each human gene...

Full Text: