Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros











Intervalo de ano de publicação
1.
Entropy (Basel) ; 24(7)2022 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-35885202

RESUMO

The digital signal processing approaches were investigated as a preliminary indicator for discriminating between the protein coding and non-coding regions of DNA. This is because a three-base periodicity (TBP) has already been proven to exist in protein-coding regions arising from the length of codons (three nucleic acids). This demonstrates that there is a prominent peak in the energy spectrum of a DNA coding sequence at frequency 13 rad/sample. However, because DNA sequences are symbolic sequences, these should be mapped into one or more signals such that the hidden information is highlighted. We propose, therefore, two new algorithms for computing adaptive mappings and, by using them, finding periodicities. Both such algorithms are based on the spectral envelope approach. This adaptive approach is essentially important since a single mapping for any DNA sequence may ignore its intrinsic properties. Finally, the improved performance of the new methods is verified by using them with synthetic and real DNA sequences as compared to the classical methods, especially the minimum entropy mapping (MEM) spectrum, which is also an adaptive method. We demonstrated that our method is both more accurate and more responsive than all its counterparts. This is especially important in this application since it reduces the risks of a coding sequence being missed.

2.
Genes Genomics ; 42(10): 1215-1226, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32865759

RESUMO

BACKGROUND: Noncoding sequences have been demonstrated to possess regulatory functions. Its classification is challenging because they do not show well-defined nucleotide patterns that can correlate with their biological functions. Genomic signal processing techniques like Fourier transform have been employed to characterize coding and noncoding sequences. This transformation in a systematic whole-genome noncoding library, such as the ENCODE database, can provide evidence of a periodic behaviour in the noncoding sequences that correlates with their regulatory functions. OBJECTIVE: The objective of this study was to classify different noncoding regulatory regions through their frequency spectra. METHODS: We computed machine learning algorithms to classify the noncoding regulatory sequences frequency spectra. RESULTS: The sequences from different regulatory regions, cell lines, and chromosomes possessed distinct frequency spectra, and that machine learning classifiers (such as those of the support vector machine type) could successfully discriminate among regulatory regions, thus correlating the frequency spectra with their biological functions CONCLUSION: Our work supports the idea that there are patterns in the noncoding sequences of the genome.


Assuntos
Genoma Humano/genética , Genômica , Aprendizado de Máquina , Sequências Reguladoras de Ácido Nucleico/genética , Algoritmos , Humanos , Nucleotídeos/genética
3.
PeerJ ; 6: e4264, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29379686

RESUMO

Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

4.
Rev. mex. ing. bioméd ; 38(3): 637-645, sep.-dic. 2017. tab, graf
Artigo em Espanhol | LILACS | ID: biblio-902377

RESUMO

RESUMEN En la actualidad, nuevas bases de datos genómicos (secuencias de ADN) son puestas al alcance del dominio público para su análisis. La bioinformática ha desarrollado algoritmos para extraer información y características de dichas secuencias. Sin embargo, estos algoritmos bioinformáticos tienen limitaciones. Una alternativa es utilizar herramientas propias del procesamiento digital de señales (DSP) adaptadas a secuencias genómicas (procesamiento de señales genómicas - GSP). El presente trabajo versa sobre el análisis de los cuatro primeros momentos centrales (media, desviación estándar, asimetría y curtosis) y dos momentos estadísticos (mediana y varianza) de los espectros frecuenciales de las 15 Regiones Reguladoras (RRs) de la base de datos ENCODE con el objetivo de estudiar diferencias estadísticas y frecuencias características. La base de datos seleccionada es "mapeada". Luego, la FFT es calculada a estas señales genómicas y finalmente los momentos estadísticos son implementados. Los resultados mues tran la existencia de 3 grupos de RRs utilizando la media, mediana y curtosis. La desviación estándar y la varianza, parecen no resaltar información importante. Finalmente, la asimetría revela un comportamiento homogéneo ante la presencia de valores atípicos en algunas RRs. Estas observaciones permiten inferir que la periodicidad dentro de la secuencia está relacionada o podría determinar la función biológica que desempeña la misma secuencia.


ABSTRACT Nowadays, new genomic databases (DNA sequences) are available to the whole scientist community for its analysis. The bioinformatics has developed algorithms to extract information and features of the sequences. However, the bioinformatics algorithms have restrictions. An alternative is the use of digital signal processing (DSP) tools adapted to genomic sequences (genomic signal processing - GSP). This work analyzes the first four statistics moments (mean, standard deviation, skewness and kurtosis) and other two moments (median and variance) of the frequency spectra of 15 regulatory regions (RRs) in ENCODE database with the main objective of studying the statistics di fferences and frequency features. The selected database is mapped. Then, the FFT is calculated to these genomic signals and finally the statistic moments implemented. The results show a three-group behavior in the RRs with the mean, median and kurtosis. The deviations standard and the variance do not show important behavior. Finally, the skewness shows a homogeneous behavior with the lack of atypical values in some RRs. These observations support the idea of the presence of periodicities in a sequence that may be related or may determine the biological function that a sequence may perform.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA