Búsqueda | Portal Regional de la BVS

A Computational Study of Potential miRNA-Disease Association Inference Based on Ensemble Learning and Kernel Ridge Regression.

Peng, Li-Hong; Zhou, Li-Qian; Chen, Xing; Piao, Xue.

Front Bioeng Biotechnol ; 8: 40, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32117922

RESUMEN

As increasing experimental studies have shown that microRNAs (miRNAs) are closely related to multiple biological processes and the prevention, diagnosis and treatment of human diseases, a growing number of researchers are focusing on the identification of associations between miRNAs and diseases. Identifying such associations purely via experiments is costly and demanding, which prompts researchers to develop computational methods to complement the experiments. In this paper, a novel prediction model named Ensemble of Kernel Ridge Regression based MiRNA-Disease Association prediction (EKRRMDA) was developed. EKRRMDA obtained features of miRNAs and diseases by integrating the disease semantic similarity, the miRNA functional similarity and the Gaussian interaction profile kernel similarity for diseases and miRNAs. Under the computational framework that utilized ensemble learning and feature dimensionality reduction, multiple base classifiers that combined two Kernel Ridge Regression classifiers from the miRNA side and disease side, respectively, were obtained based on random selection of features. Then average strategy for these base classifiers was adopted to obtain final association scores of miRNA-disease pairs. In the global and local leave-one-out cross validation, EKRRMDA attained the AUCs of 0.9314 and 0.8618, respectively. Moreover, the model's average AUC with standard deviation in 5-fold cross validation was 0.9275 ± 0.0008. In addition, we implemented three different types of case studies on predicting miRNAs associated with five important diseases. As a result, there were 90% (Esophageal Neoplasms), 86% (Kidney Neoplasms), 86% (Lymphoma), 98% (Lung Neoplasms), and 96% (Breast Neoplasms) of the top 50 predicted miRNAs verified to have associations with these diseases.

Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model.

Yu, Zu-Guo; Chu, Ka Hou; Li, Chi Pang; Anh, Vo; Zhou, Li-Qian; Wang, Roger Wei.

BMC Evol Biol ; 10: 192, 2010 Jun 22.

Artículo en Inglés | MEDLINE | ID: mdl-20565983

RESUMEN

BACKGROUND: The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. RESULTS: In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). CONCLUSIONS: The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.

Asunto(s)

Virus ADN/genética , Parvovirus/genética , Filogenia , Proteoma/genética , Análisis de Secuencia de ADN/métodos , Virus ADN/clasificación , ADN Viral/genética , Genoma Viral , Parvovirus/clasificación

Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides.

Yang, Jian-Yi; Zhou, Yu; Yu, Zu-Guo; Anh, Vo; Zhou, Li-Qian.

BMC Bioinformatics ; 9: 113, 2008 Feb 24.

Artículo en Inglés | MEDLINE | ID: mdl-18294399

RESUMEN

BACKGROUND: Promoter region plays an important role in determining where the transcription of a particular gene should be initiated. Computational prediction of eukaryotic Pol II promoter sequences is one of the most significant problems in sequence analysis. Existing promoter prediction methods are still far from being satisfactory. RESULTS: We attempt to recognize the human Pol II promoter sequences from the non-promoter sequences which are made up of exon and intron sequences. Four methods are used: two kinds of multifractal analysis performed on the numeric sequences obtained from the dinucleotide free energy, Z curve analysis and global descriptor of the promoter/non-promoter primary sequences. A total of 141 parameters are extracted from these methods and categorized into seven groups (methods). They are used to generate certain spaces and then each promoter/non-promoter sequence is represented by a point in the corresponding space. All the 120 possible combinations of the seven methods are tested. Based on Fisher's linear discriminant algorithm, with a relatively smaller number of parameters (96 and 117), we get satisfactory discriminant accuracies. Particularly, in the case of 117 parameters, the accuracies for the training and test sets reach 90.43% and 89.79%, respectively. A comparison with five other existing methods indicates that our methods have a better performance. Using the global descriptor method (36 parameters), 17 of the 18 experimentally verified promoter sequences of human chromosome 22 are correctly identified. CONCLUSION: The high accuracies achieved suggest that the methods of this paper are useful for understanding the difficult problem of promoter prediction.

Asunto(s)

Algoritmos , ADN Polimerasa II/genética , Reconocimiento de Normas Patrones Automatizadas/métodos , Regiones Promotoras Genéticas/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Entropía , Humanos , Datos de Secuencia Molecular

A fractal method to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation.

Zhou, Li-Qian; Yu, Zu-Guo; Deng, Ji-Qing; Anh, Vo; Long, Shun-Chao.

J Theor Biol ; 232(4): 559-67, 2005 Feb 21.

Artículo en Inglés | MEDLINE | ID: mdl-15588636

RESUMEN

A fractal method to distinguish coding and non-coding sequences in a complete genome is proposed, based on different statistical behaviors between these two kinds of sequences. We first propose a number sequence representation of DNA sequences. Multifractal analysis is then performed on the measure representation of the obtained number sequence. The three exponents C(-1), C1 and C2 are selected from the result of multifractal analysis. Each DNA may be represented by a point in the three-dimensional space generated by these three-component vectors. It is shown that points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes are roughly distributed in different regions. Fisher's discriminant algorithm can be used to separate these two regions in the spanned space. If the point (C(-1),C1,C2) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. For all 51 prokaryotes we considered , the average discriminant accuracies pc,pnc,qc and qnc reach 72.28%, 84.65%, 72.53% and 84.18%, respectively.

Asunto(s)

Simulación por Computador , Fractales , Modelos Genéticos , Animales , Código Genético , Genoma

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA