RESUMEN
Increasingly powerful sequencing technologies are ushering in an era of personal genome sequences and raising the possibility of using such information to guide medical decisions. Genome resequencing also promises to accelerate the identification of disease-associated mutations. Roughly 98% of the human genome is composed of repeats and intergenic or non-protein-coding sequences. Thus, it is crucial to focus resequencing on high-value genomic regions. Protein-coding exons represent one such type of high-value target. We have developed a method of using flexible, high-density microarrays to capture any desired fraction of the human genome, in this case corresponding to more than 200,000 protein-coding exons. Depending on the precise protocol, up to 55-85% of the captured fragments are associated with targeted regions and up to 98% of intended exons can be recovered. This methodology provides an adaptable route toward rapid and efficient resequencing of any sizeable, non-repeat portion of the human genome.
Asunto(s)
Exones , Genoma Humano , Análisis de Secuencia de ADN/métodos , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Oligonucleótidos/genéticaRESUMEN
Since the completion of the Arabidopsis thaliana genome sequence, there is an ongoing effort to annotate the genome as accurately as possible. Comparing genome sequences of related species complements the current annotation strategies by identifying genes and improving gene structure. A total of 595,321 Brassica oleracea shotgun reads were sequenced by TIGR (The Institute for Genome Research) and the collaboration of Washington University and Cold Spring Harbor. Vicogenta (a genome viewer based on GMOD and GBrowse) was created to view the current annotation and sequence alignments for Arabidopsis. Brassica reads were compared with the Arabidopsis genome and proteome databases using BLAST. Hypothetical genes and conserved unannotated regions on the short arm of chromosome 4 from Arabidopsis were experimentally verified using RT-PCR. We were able to improve the Arabidopsis annotation by identifying 25 genes that were missed, and confirming expression of 43 hypothetical genes in Arabidopsis. We were also able to detect conservation in genes whose transcription is normally suppressed due to methylation. We also examined how useful the O. sativa genome and ESTs from other species are, compared with Brassica, in improving the Arabidopsis annotation.
Asunto(s)
Arabidopsis/genética , Brassica/genética , Genoma de Planta , Genómica/métodos , Oryza/genética , Secuencia de Aminoácidos , Cromosomas de las Plantas , Secuencia Conservada , Bases de Datos como Asunto , Etiquetas de Secuencia Expresada , Genes de Plantas , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Homología de Secuencia de Aminoácido , Especificidad de la EspecieRESUMEN
The completion of the mouse and other mammalian genome sequences will provide necessary, but not sufficient, knowledge for an understanding of much of mouse biology at the molecular level. As a requisite next step in this process, the genes in mouse and their structure must be elucidated. In particular, knowledge of the transcriptional start site of these genes will be necessary for further study of their regulatory regions. To assess the current state of mouse genome annotation to support this activity, we identified several hundred gene predictions in mouse with varying levels of supporting evidence and tested them using RACE-PCR. Modifications were made to the procedure allowing pooling of RNA samples, resulting in a scaleable procedure. The results illustrate potential errors or omissions in the current 5' end annotations in 58% of the genes detected. In testing experimentally unsupported gene predictions, we were able to identify 58 that are not usually annotated as genes but produced spliced transcripts (approximately 25% success rate). In addition, in many genes we were able to detect novel exons not predicted by any gene prediction algorithms. In 19.8% of the genes detected in this study, multiple transcript species were observed. These data show an urgent need to provide direct experimental validation of gene annotations. Moreover, these results show that direct validation using RACE-PCR can be an important component of genome-wide validation. This approach can be a useful tool in the ongoing efforts to increase the quality of gene annotations, especially transcriptional start sites, in complex genomes.
Asunto(s)
Genes/genética , Genoma , Ratones/genética , Sistemas de Lectura Abierta/genética , Sitio de Iniciación de la Transcripción , Animales , Secuencia de Bases , Islas de CpG/genética , Cartilla de ADN , ADN Complementario/genética , Exones/genética , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa/métodos , Análisis de Secuencia de ADNRESUMEN
Gene silencing by RNA interference (RNAi) in mammalian cells using small interfering RNAs (siRNAs) and short hairpin RNAs (shRNAs) has become a valuable genetic tool. Here, we report the construction and application of a shRNA expression library targeting 9,610 human and 5,563 mouse genes. This library is presently composed of about 28,000 sequence-verified shRNA expression cassettes contained within multi-functional vectors, which permit shRNA cassettes to be packaged in retroviruses, tracked in mixed cell populations by means of DNA 'bar codes', and shuttled to customized vectors by bacterial mating. In order to validate the library, we used a genetic screen designed to report defects in human proteasome function. Our results suggest that our large-scale RNAi library can be used in specific, genetic applications in mammals, and will become a valuable resource for gene analysis and discovery.
Asunto(s)
Biblioteca de Genes , Ingeniería Genética/métodos , Interferencia de ARN , ARN Interferente Pequeño/genética , Animales , Clonación Molecular , Cisteína Endopeptidasas/genética , Cisteína Endopeptidasas/metabolismo , Genes/genética , Vectores Genéticos , Humanos , Ratones , Complejos Multienzimáticos/genética , Complejos Multienzimáticos/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Complejo de la Endopetidasa Proteasomal , ARN Interferente Pequeño/metabolismo , Reproducibilidad de los Resultados , Especificidad por SustratoRESUMEN
Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.