Your browser doesn't support javascript.
loading
Deep Splicer: A CNN Model for Splice Site Prediction in Genetic Sequences.
Fernandez-Castillo, Elisa; Barbosa-Santillán, Liliana Ibeth; Falcon-Morales, Luis; Sánchez-Escobar, Juan Jaime.
Afiliação
  • Fernandez-Castillo E; School of Engineering and Sciences, Monterrey Institute of Technology and Higher Education, Guadalajara 45201, Mexico.
  • Barbosa-Santillán LI; School of Engineering and Sciences, Monterrey Institute of Technology and Higher Education, Guadalajara 45201, Mexico.
  • Falcon-Morales L; School of Engineering and Sciences, Monterrey Institute of Technology and Higher Education, Guadalajara 45201, Mexico.
  • Sánchez-Escobar JJ; Science Research Department, Center for Industrial Technical Teaching, Guadalajara 44638, Mexico.
Genes (Basel) ; 13(5)2022 05 19.
Article em En | MEDLINE | ID: mdl-35627292
Many living organisms have DNA in their cells that is responsible for their biological features. DNA is an organic molecule of two complementary strands of four different nucleotides wound up in a double helix. These nucleotides are adenine (A), thymine (T), guanine (G), and cytosine (C). Genes are DNA sequences containing the information to synthesize proteins. The genes of higher eukaryotic organisms contain coding sequences, known as exons and non-coding sequences, known as introns, which are removed on splice sites after the DNA is transcribed into RNA. Genome annotation is the process of identifying the location of coding regions and determining their function. This process is fundamental for understanding gene structure; however, it is time-consuming and expensive when done by biochemical methods. With technological advances, splice site detection can be done computationally. Although various software tools have been developed to predict splice sites, they need to improve accuracy and reduce false-positive rates. The main goal of this research was to generate Deep Splicer, a deep learning model to identify splice sites in the genomes of humans and other species. This model has good performance metrics and a lower false-positive rate than the currently existing tools. Deep Splicer achieved an accuracy between 93.55% and 99.66% on the genetic sequences of different organisms, while Splice2Deep, another splice site detection tool, had an accuracy between 90.52% and 98.08%. Splice2Deep surpassed Deep Splicer on the accuracy obtained after evaluating C. elegans genomic sequences (97.88% vs. 93.62%) and A. thaliana (95.40% vs. 94.93%); however, Deep Splicer's accuracy was better for H. sapiens (98.94% vs. 97.15%) and D. melanogaster (97.14% vs. 92.30%). The rate of false positives was 0.11% for human genetic sequences and 0.25% for other species' genetic sequences. Another splice prediction tool, Splice Finder, had between 1% and 3% of false positives for human sequences, while other species' sequences had around 4% and 10%.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Caenorhabditis elegans / Drosophila melanogaster Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals / Humans Idioma: En Revista: Genes (Basel) Ano de publicação: 2022 Tipo de documento: Article País de afiliação: México País de publicação: Suíça

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Caenorhabditis elegans / Drosophila melanogaster Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals / Humans Idioma: En Revista: Genes (Basel) Ano de publicação: 2022 Tipo de documento: Article País de afiliação: México País de publicação: Suíça