Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 188
Filtrar
1.
J Comput Biol ; 29(1): 23-26, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35020490

RESUMEN

scDesign2 is a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. This article shows how to download and install the scDesign2 R package, how to fit probabilistic models (one per cell type) to real data and simulate synthetic data from the fitted models, and how to use scDesign2 to guide experimental design and benchmark computational methods. Finally, a note is given about cell clustering as a preprocessing step before model fitting and data simulation.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Programas Informáticos , Algoritmos , Animales , Análisis por Conglomerados , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Expresión Génica , Ratones , Modelos Estadísticos , RNA-Seq/estadística & datos numéricos
2.
J Comput Biol ; 29(2): 121-139, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35041494

RESUMEN

Current expression quantification methods suffer from a fundamental but undercharacterized type of error: the most likely estimates for transcript abundances are not unique. This means multiple estimates of transcript abundances generate the observed RNA-seq reads with equal likelihood, and the underlying true expression cannot be determined. This is called nonidentifiability in probabilistic modeling. It is further exacerbated by incomplete reference transcriptomes where reads may be sequenced from unannotated transcripts. Graph quantification is a generalization to transcript quantification, accounting for the reference incompleteness by allowing exponentially many unannotated transcripts to express reads. We propose methods to calculate a "confidence range of expression" for each transcript, representing its possible abundance across equally optimal estimates for both quantification models. This range informs both whether a transcript has potential estimation error due to nonidentifiability and the extent of the error. Applying our methods to the Human Body Map data, we observe that 35%-50% of transcripts potentially suffer from inaccurate quantification caused by nonidentifiability. When comparing the expression between isoforms in one sample, we find that the degree of inaccuracy of 20%-47% transcripts can be so large that the ranking of expression between the transcript and other isoforms from the same gene cannot be determined. When comparing the expression of a transcript between two groups of RNA-seq samples in differential expression analysis, we observe that the majority of detected differentially expressed transcripts are reliable with a few exceptions after considering the ranges of the optimal expression estimates.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/estadística & datos numéricos , Transcriptoma , Empalme Alternativo , Biología Computacional , Intervalos de Confianza , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Humanos , Modelos Estadísticos , RNA-Seq/estadística & datos numéricos
3.
J Comput Biol ; 29(1): 27-44, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35050715

RESUMEN

We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate that GRNUlar outperforms state-of-the-art methods on both synthetic and real data sets. Our study also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.


Asunto(s)
Aprendizaje Profundo , Redes Reguladoras de Genes , Análisis de la Célula Individual/estadística & datos numéricos , Algoritmos , Animales , Sesgo , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Escherichia coli/genética , Humanos , Ratones , Redes Neurales de la Computación , RNA-Seq/estadística & datos numéricos , Saccharomyces cerevisiae/genética , Aprendizaje Automático Supervisado
4.
Comput Math Methods Med ; 2021: 7029130, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34737790

RESUMEN

Tumor recurrence and metastasis often occur in HCC patients after surgery, and the prognosis is not optimistic. Hence, searching effective biomarkers for prognosis of is of great importance. Firstly, HCC-related data was acquired from the TCGA and GEO databases. Based on GEO data, 256 differentially expressed genes (DEGs) were obtained firstly. Subsequently, to clarify function of DEGs, clusterProfiler package was used to conduct functional enrichment analyses on DEGs. Protein-protein interaction (PPI) network analysis screened 20 key genes. The key genes were filtered via GEPIA database, by which 11 hub genes (F9, CYP3A4, ASPM, AURKA, CDC20, CDCA5, NCAP, PRC1, PTTG1, TOP2A, and KIFC1) were screened out. Then, univariate Cox analysis was applied to construct a prognostic model, followed by a prediction performance validation. With the risk score calculated by the model and common clinical features, univariate and multivariate analyses were carried out to assess whether the prognostic model could be used independently for prognostic prediction. In conclusion, the current study screened HCC prognostic gene signature based on public databases.


Asunto(s)
Biomarcadores de Tumor/genética , Carcinoma Hepatocelular/genética , Redes Reguladoras de Genes , Neoplasias Hepáticas/genética , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Regulación Neoplásica de la Expresión Génica , Ontología de Genes , Humanos , Análisis Multivariante , Pronóstico , Modelos de Riesgos Proporcionales , Mapas de Interacción de Proteínas/genética
5.
Int J Mol Sci ; 22(21)2021 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-34768960

RESUMEN

Deep learning has proven advantageous in solving cancer diagnostic or classification problems. However, it cannot explain the rationale behind human decisions. Biological pathway databases provide well-studied relationships between genes and their pathways. As pathways comprise knowledge frameworks widely used by human researchers, representing gene-to-pathway relationships in deep learning structures may aid in their comprehension. Here, we propose a deep neural network (PathDeep), which implements gene-to-pathway relationships in its structure. We also provide an application framework measuring the contribution of pathways and genes in deep neural networks in a classification problem. We applied PathDeep to classify cancer and normal tissues based on the publicly available, large gene expression dataset. PathDeep showed higher accuracy than fully connected neural networks in distinguishing cancer from normal tissues (accuracy = 0.994) in 32 tissue samples. We identified 42 pathways related to 32 cancer tissues and 57 associated genes contributing highly to the biological functions of cancer. The most significant pathway was G-protein-coupled receptor signaling, and the most enriched function was the G1/S transition of the mitotic cell cycle, suggesting that these biological functions were the most common cancer characteristics in the 32 tissues.


Asunto(s)
Aprendizaje Profundo , Neoplasias/clasificación , Neoplasias/genética , RNA-Seq/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Diagnóstico por Computador , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Neoplasias/diagnóstico , Redes Neurales de la Computación
6.
Comput Math Methods Med ; 2021: 6015473, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34603484

RESUMEN

Hypoxic ischemic encephalopathy (HIE) is classified as a sort of serious nervous system syndrome that occurs in the early life period. Noncoding RNAs had been confirmed to have crucial roles in human diseases. So far, there were few systematical and comprehensive studies towards the expression profile of RNAs in the brain after hypoxia ischemia. In this study, 31 differentially expressed microRNAs (miRNAs) with upregulation were identified. In addition, 5512 differentially expressed mRNAs, long noncoding RNAs (lncRNAs), and circular RNAs (circRNAs) were identified in HIE groups. Bioinformatics analysis showed these circRNAs and mRNAs were significantly enriched in regulation of leukocyte activation, response to virus, and neutrophil degranulation. Pathway and its related gene network analysis indicated that HLA - DPA1, HLA - DQA2, HLA - DQB1, and HLA - DRB4 have a more crucial role in HIE. Finally, miRNA-circRNA-mRNA interaction network analysis was also performed to identify hub miRNAs and circRNAs. We found that miR-592 potentially targeting 5 circRNAs, thus affecting 15 mRNA expressions in HIR. hsa_circ_0068397 and hsa_circ_0045698 were identified as hub circRNAs in HIE. Collectively, using RNA-seq, bioinformatics analysis, and circRNA/miRNA interaction prediction, we systematically investigated the differentially expressed RNAs in HIE, which could give a new hint of understanding the pathogenesis of HIE.


Asunto(s)
Redes Reguladoras de Genes , Hipoxia-Isquemia Encefálica/genética , MicroARNs/genética , ARN Circular/genética , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Regulación hacia Abajo , Perfilación de la Expresión Génica/estadística & datos numéricos , Antígenos HLA-D/genética , Humanos , Hipoxia-Isquemia Encefálica/inmunología , Fenómenos Inmunogenéticos , ARN Mensajero/genética , RNA-Seq , Regulación hacia Arriba
7.
Comput Math Methods Med ; 2021: 8020879, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34603485

RESUMEN

BACKGROUND: The competitive endogenous RNA (ceRNA) mechanism has been discovered recently and regulating cancer-related gene expressions. The ceRNA network participates in multiple processes, such as cell proliferation and metastasis, and potentially drives the progression of cancer. In this study, we focus on the ceRNA networks of esophageal squamous cell carcinoma and discovered a novel biomarker panel for cancer prognosis. METHODS: RNA expression data of esophageal carcinoma from the TCGA database were achieved and constructed ceRNA network in esophageal carcinoma using R packages. RESULTS: Four miRNAs were discovered as the core of the ceRNA model, including miR-93, miR-191, miR-99b, and miR-3615. Moreover, we constructed a ceRNA network in esophageal carcinoma, which included 4 miRNAs and 6 lncRNAs. After ceRNA network modeling, we investigated six lncRNAs which could be taken together as a panel for prognosis prediction of esophageal cancer, including LINC02575, LINC01087, LINC01816, AL136162.1, AC012073.1, and AC117402.1. Finally, we tested the predictive power of the panel in all TCGA samples. CONCLUSIONS: Our study discovered a new biomarker panel which may have potential values in the prediction of prognosis of esophageal carcinoma.


Asunto(s)
Neoplasias Esofágicas/genética , Carcinoma de Células Escamosas de Esófago/genética , ARN Largo no Codificante/genética , Biomarcadores de Tumor/genética , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , MicroARNs/genética , Modelos Genéticos , Pronóstico , ARN Mensajero/genética , RNA-Seq
8.
Comput Math Methods Med ; 2021: 7764764, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34484416

RESUMEN

As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications.


Asunto(s)
Algoritmos , Guanosina/análogos & derivados , Procesamiento Postranscripcional del ARN/genética , Secuencia de Bases , Sitios de Unión/genética , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Aprendizaje Profundo , Guanosina/genética , Guanosina/metabolismo , Humanos , Modelos Lineales
9.
PLoS Comput Biol ; 17(8): e1008904, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34339413

RESUMEN

The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.


Asunto(s)
Inmunogenética/estadística & datos numéricos , Receptores KIR/genética , África Austral , Alelos , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Europa (Continente) , Dosificación de Gen , Genética de Población/estadística & datos numéricos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Polimorfismo Genético , Receptores KIR/clasificación , Alineación de Secuencia/estadística & datos numéricos , Diseño de Software
10.
Comput Math Methods Med ; 2021: 1835056, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34306171

RESUMEN

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed CNN, CNN-LSTM, and CNN-Bidirectional LSTM architectures using Label and K-mer encoding for DNA sequence classification. The models are evaluated on different classification metrics. From the experimental results, the CNN and CNN-Bidirectional LSTM with K-mer encoding offers high accuracy with 93.16% and 93.13%, respectively, on testing data.


Asunto(s)
COVID-19/virología , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Redes Neurales de la Computación , SARS-CoV-2/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , Secuencia de Bases , Biología Computacional , ADN Viral/clasificación , ADN Viral/genética , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Aprendizaje Profundo , Humanos , Pandemias , SARS-CoV-2/clasificación
11.
PLoS Comput Biol ; 17(6): e1009118, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34138847

RESUMEN

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.


Asunto(s)
RNA-Seq/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Programas Informáticos , Animales , Análisis por Conglomerados , Biología Computacional , Simulación por Computador , Interpretación Estadística de Datos , Visualización de Datos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Perfilación de la Expresión Génica/estadística & datos numéricos , Técnicas Genéticas/estadística & datos numéricos , Humanos , ARN Mensajero/genética , ARN Mensajero/aislamiento & purificación
12.
PLoS Comput Biol ; 17(6): e1009078, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34153026

RESUMEN

It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).


Asunto(s)
Mapeo Contig/estadística & datos numéricos , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Análisis por Conglomerados , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Variación Genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programación Lineal , Análisis de Secuencia de ADN
13.
Commun Biol ; 4(1): 660, 2021 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-34079055

RESUMEN

The female mammary epithelium undergoes reorganization during development, pregnancy, and menopause, linking higher risk with breast cancer development. To characterize these periods of complex remodeling, here we report integrated 50 K mouse and 24 K human mammary epithelial cell atlases obtained by single-cell RNA sequencing, which covers most lifetime stages. Our results indicate a putative trajectory that originates from embryonic mammary stem cells which differentiates into three epithelial lineages (basal, luminal hormone-sensing, and luminal alveolar), presumably arising from unipotent progenitors in postnatal glands. The lineage-specific genes infer cells of origin of breast cancer using The Cancer Genome Atlas data and single-cell RNA sequencing of human breast cancer, as well as the association of gland reorganization to different breast cancer subtypes. This comprehensive mammary cell gene expression atlas ( https://mouse-mammary-epithelium-integrated.cells.ucsc.edu ) presents insights into the impact of the internal and external stimuli on the mammary epithelium at an advanced resolution.


Asunto(s)
Neoplasias de la Mama/etiología , Mama/citología , Mama/metabolismo , Glándulas Mamarias Animales/citología , Glándulas Mamarias Animales/metabolismo , Neoplasias Mamarias Experimentales/etiología , Animales , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Carcinogénesis/genética , Linaje de la Célula/genética , Transformación Celular Neoplásica/genética , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Células Epiteliales/citología , Células Epiteliales/metabolismo , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Mamarias Experimentales/genética , Neoplasias Mamarias Experimentales/patología , Ratones , Ratones Endogámicos BALB C , Embarazo , RNA-Seq/estadística & datos numéricos
14.
PLoS Comput Biol ; 17(6): e1009064, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-34077420

RESUMEN

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.


Asunto(s)
Genómica/estadística & datos numéricos , Aprendizaje Automático , Programas Informáticos , Animales , Corteza Cerebral/metabolismo , Análisis por Conglomerados , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Células Dendríticas/metabolismo , Humanos , Teoría de la Información , Ratones , ARN Citoplasmático Pequeño/genética , RNA-Seq , Análisis de la Célula Individual/estadística & datos numéricos
15.
Comput Math Methods Med ; 2021: 6680211, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33747117

RESUMEN

Atrial fibrillation (AF) is one of the most common supraventricular arrhythmias worldwide. However, the specific molecular mechanism underlying AF remains unclear. Our study is aimed at identifying pivotal microRNAs (miRNAs) and targeting genes associated with persistent AF (pAF) using bioinformatics analysis. Three gene expression array datasets (GSE31821, GSE41177, and GSE79768) and an miRNA expression array dataset (GSE68475) associated with pAF were downloaded. Differentially expressed genes (DEGs) were identified using the LIMMA package, and differentially expressed miRNAs (DEMs) were screened from GSE68475. Target genes for DEMs were predicted using the miRTarBase database, and intersections between these target genes and DEGs were selected for further analysis, including the generation of protein-protein interaction (PPI) network, miRNA-transcription factor-target regulatory network, and drug-gene network. A total of 264 DEGs and 40 DEMs were identified between the pAF and control groups. Functional and pathway enrichment analyses of up- and downregulated DEGs were performed. The common genes (CGs) were primarily enriched in the phosphoinositide 3-kinase- (PI3K-) protein kinase B (Akt) signaling pathway, negative regulation of cell division, and response to hypoxia. The PPI network, miRNA-transcription factor-target regulatory network, and drug-gene network were constructed using Cytoscape. The present study revealed several novel miRNAs and genes involved in pAF. We speculated that miR-4298, miR-3125, miR-4306, and miR-671-5p could represent significant miRNAs that act on the target gene superoxide dismutase 2 (SOD2) during the development of pAF and may serve as essential biomarkers for pAF diagnosis and treatment. Moreover, MYC might function in pAF pathogenesis through the PI3K-Akt signaling pathway.


Asunto(s)
Fibrilación Atrial/genética , MicroARNs/genética , Fibrilación Atrial/metabolismo , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Genes myc , Humanos , MicroARNs/metabolismo , Miocardio/metabolismo , Fosfatidilinositol 3-Quinasas/metabolismo , Mapas de Interacción de Proteínas/genética , Proteínas Proto-Oncogénicas c-akt/metabolismo , Transducción de Señal/genética , Superóxido Dismutasa/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
16.
Comput Math Methods Med ; 2021: 6691096, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33680070

RESUMEN

Preeclampsia (PE) is a maternal disease that causes maternal and child death. Treatment and preventive measures are not sound enough. The problem of PE screening has attracted much attention. The purpose of this study is to screen placental mRNA to obtain the best PE biomarkers for identifying patients with PE. We use Limma in the R language to screen out the 48 differentially expressed genes with the largest differences and used correlation-based feature selection algorithms to reduce the dimensionality and avoid attribute redundancy arising from too many mRNA samples participating in the classification. After reducing the mRNA attributes, the mRNA samples are sorted from large to small according to information gain. In this study, a classifier model is designed to identify whether samples had PE through mRNA in the placenta. To improve the accuracy of classification and avoid overfitting, three classifiers, including C4.5, AdaBoost, and multilayer perceptron, are used. We use the majority voting strategy integrated with the differentially expressed genes and the genes filtered by the best subset method as comparison methods to train the classifier. The results show that the classification accuracy rate has increased from 79% to 82.2%, and the number of mRNA features has decreased from 48 to 13. This study provides clues for the main PE biomarkers of mRNA in the placenta and provides ideas for the treatment and screening of PE.


Asunto(s)
Aprendizaje Automático , Placenta/metabolismo , Preeclampsia/diagnóstico , Preeclampsia/genética , ARN Mensajero/genética , Algoritmos , Biomarcadores/metabolismo , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Árboles de Decisión , Diagnóstico por Computador , Femenino , Marcadores Genéticos , Pruebas Genéticas , Humanos , Redes Neurales de la Computación , Embarazo , ARN Mensajero/metabolismo , Transcriptoma
17.
Comput Math Methods Med ; 2021: 6636350, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33488763

RESUMEN

A promoter is a short DNA sequence near to the start codon, responsible for initiating transcription of a specific gene in genome. The accurate recognition of promoters has great significance for a better understanding of the transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types timely and accurately. A number of prediction methods had been developed in this regard; however, almost all of them were merely used for identifying promoters and their strength or sigma types. Owing to that TATA box region in TATA promoter that influences posttranscriptional processes, in the current study, we developed a two-layer predictor called iPTT(2L)-CNN by using the convolutional neural network (CNN) for identifying TATA and TATA-less promoters. The first layer can be used to identify a given DNA sequence as a promoter or nonpromoter. The second layer is used to identify whether the recognized promoter is TATA promoter or not. The 5-fold crossvalidation and independent testing results demonstrate that the constructed predictor is promising for identifying promoter and classifying TATA and TATA-less promoter. Furthermore, to make it easier for most experimental scientists get the results they need, a user-friendly web server has been established at http://www.jci-bioinfo.cn/iPPT(2L)-CNN.


Asunto(s)
Genoma de Planta , Redes Neurales de la Computación , Regiones Promotoras Genéticas , Biología Computacional , ADN de Plantas/genética , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Análisis de Secuencia de ADN , Especificidad de la Especie , TATA Box , Zea mays/genética
18.
Nucleic Acids Res ; 49(D1): D82-D85, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33175160

RESUMEN

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), provided by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), has for almost forty years continued in its mission to freely archive and present the world's public sequencing data for the benefit of the entire scientific community and for the acceleration of the global research effort. Here we highlight the major developments to ENA services and content in 2020, focussing in particular on the recently released updated ENA browser, modernisation of our release process and our data coordination collaborations with specific research communities.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos/tendencias , Ácidos Nucleicos/genética , Nucleótidos/genética , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Europa (Continente) , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Anotación de Secuencia Molecular , Ácidos Nucleicos/química , Nucleótidos/química , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
19.
Nucleic Acids Res ; 49(D1): D29-D37, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33245775

RESUMEN

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.


Asunto(s)
COVID-19/prevención & control , Biología Computacional/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Almacenamiento y Recuperación de la Información/métodos , SARS-CoV-2/genética , Proteínas Virales/genética , COVID-19/epidemiología , COVID-19/virología , Biología Computacional/métodos , Biología Computacional/organización & administración , Bases de Datos de Ácidos Nucleicos/organización & administración , Salud Global , Humanos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , Internet , Pandemias , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Proteínas Virales/metabolismo
20.
PLoS Comput Biol ; 16(11): e1008405, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33166290

RESUMEN

Given the complexity and diversity of the cancer genomics profiles, it is challenging to identify distinct clusters from different cancer types. Numerous analyses have been conducted for this propose. Still, the methods they used always do not directly support the high-dimensional omics data across the whole genome (Such as ATAC-seq profiles). In this study, based on the deep adversarial learning, we present an end-to-end approach ClusterATAC to leverage high-dimensional features and explore the classification results. On the ATAC-seq dataset and RNA-seq dataset, ClusterATAC has achieved excellent performance. Since ATAC-seq data plays a crucial role in the study of the effects of non-coding regions on the molecular classification of cancers, we explore the clustering solution obtained by ClusterATAC on the pan-cancer ATAC dataset. In this solution, more than 70% of the clustering are single-tumor-type-dominant, and the vast majority of the remaining clusters are associated with similar tumor types. We explore the representative non-coding loci and their linked genes of each cluster and verify some results by the literature search. These results suggest that a large number of non-coding loci affect the development and progression of cancer through its linked genes, which can potentially advance cancer diagnosis and therapy.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina/estadística & datos numéricos , Aprendizaje Profundo , Neoplasias/clasificación , Neoplasias/genética , Cromatina/genética , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Genómica/métodos , Genómica/estadística & datos numéricos , Humanos , Familia de Multigenes , Distribución Normal , Oncogenes , RNA-Seq/estadística & datos numéricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA