Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38935069

RESUMEN

MOTIVATION: In the past decade, single-cell RNA sequencing (scRNA-seq) has emerged as a pivotal method for transcriptomic profiling in biomedical research. Precise cell-type identification is crucial for subsequent analysis of single-cell data. And the integration and refinement of annotated data are essential for building comprehensive databases. However, prevailing annotation techniques often overlook the hierarchical organization of cell types, resulting in inconsistent annotations. Meanwhile, most existing integration approaches fail to integrate datasets with different annotation depths and none of them can enhance the labels of outdated data with lower annotation resolutions using more intricately annotated datasets or novel biological findings. RESULTS: Here, we introduce scPLAN, a hierarchical computational framework designed for scRNA-seq data analysis. scPLAN excels in annotating unlabeled scRNA-seq data using a reference dataset structured along a hierarchical cell-type tree. It identifies potential novel cell types in a systematic, layer-by-layer manner. Additionally, scPLAN effectively integrates annotated scRNA-seq datasets with varying levels of annotation depth, ensuring consistent refinement of cell-type labels across datasets with lower resolutions. Through extensive annotation and novel cell detection experiments, scPLAN has demonstrated its efficacy. Two case studies have been conducted to showcase how scPLAN integrates datasets with diverse cell-type label resolutions and refine their cell-type labels. AVAILABILITY: https://github.com/michaelGuo1204/scPLAN.


Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Humanos , Programas Informáticos , Transcriptoma , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , Anotación de Secuencia Molecular/métodos
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38388681

RESUMEN

MOTIVATION: Cell-type annotation of single-cell RNA-sequencing (scRNA-seq) data is a hallmark of biomedical research and clinical application. Current annotation tools usually assume the simultaneous acquisition of well-annotated data, but without the ability to expand knowledge from new data. Yet, such tools are inconsistent with the continuous emergence of scRNA-seq data, calling for a continuous cell-type annotation model. In addition, by their powerful ability of information integration and model interpretability, transformer-based pre-trained language models have led to breakthroughs in single-cell biology research. Therefore, the systematic combining of continual learning and pre-trained language models for cell-type annotation tasks is inevitable. RESULTS: We herein propose a universal cell-type annotation tool, called CANAL, that continuously fine-tunes a pre-trained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges. CANAL essentially alleviates the dilemma of catastrophic forgetting, both in terms of model inputs and outputs. For model inputs, we introduce an experience replay schema that repeatedly reviews previous vital examples in current training stages. This is achieved through a dynamic example bank with a fixed buffer size. The example bank is class-balanced and proficient in retaining cell-type-specific information, particularly facilitating the consolidation of patterns associated with rare cell types. For model outputs, we utilize representation knowledge distillation to regularize the divergence between previous and current models, resulting in the preservation of knowledge learned from past training stages. Moreover, our universal annotation framework considers the inclusion of new cell types throughout the fine-tuning and testing stages. We can continuously expand the cell-type annotation library by absorbing new cell types from newly arrived, well-annotated training datasets, as well as automatically identify novel cells in unlabeled datasets. Comprehensive experiments with data streams under various biological scenarios demonstrate the versatility and high model interpretability of CANAL. AVAILABILITY: An implementation of CANAL is available from https://github.com/aster-ww/CANAL-torch. CONTACT: dengmh@pku.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Journal Name online.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Lenguaje , Análisis de Secuencia de ARN/métodos
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279647

RESUMEN

MOTIVATION: The rapid development of spatial transcriptome technologies has enabled researchers to acquire single-cell-level spatial data at an affordable price. However, computational analysis tools, such as annotation tools, tailored for these data are still lacking. Recently, many computational frameworks have emerged to integrate single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics datasets. While some frameworks can utilize well-annotated scRNA-seq data to annotate spatial expression patterns, they overlook critical aspects. First, existing tools do not explicitly consider cell type mapping when aligning the two modalities. Second, current frameworks lack the capability to detect novel cells, which remains a key interest for biologists. RESULTS: To address these problems, we propose an annotation method for spatial transcriptome data called SPANN. The main tasks of SPANN are to transfer cell-type labels from well-annotated scRNA-seq data to newly generated single-cell resolution spatial transcriptome data and discover novel cells from spatial data. The major innovations of SPANN come from two aspects: SPANN automatically detects novel cells from unseen cell types while maintaining high annotation accuracy over known cell types. SPANN finds a mapping between spatial transcriptome samples and RNA data prototypes and thus conducts cell-type-level alignment. Comprehensive experiments using datasets from various spatial platforms demonstrate SPANN's capabilities in annotating known cell types and discovering novel cell states within complex tissue contexts. AVAILABILITY: The source code of SPANN can be accessed at https://github.com/ddb-qiwang/SPANN-torch. CONTACT: dengmh@math.pku.edu.cn.


Asunto(s)
Análisis de Expresión Génica de una Sola Célula , Transcriptoma , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Programas Informáticos
4.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36383167

RESUMEN

MOTIVATION: Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data. RESULTS: We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust. AVAILABILITY AND IMPLEMENTATION: An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Exactitud de los Datos , Multiómica , Análisis por Conglomerados
5.
Front Genet ; 13: 977968, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36072672

RESUMEN

Single-cell multiomics sequencing techniques have rapidly developed in the past few years. Among these techniques, single-cell cellular indexing of transcriptomes and epitopes (CITE-seq) allows simultaneous quantification of gene expression and surface proteins. Clustering CITE-seq data have the great potential of providing us with a more comprehensive and in-depth view of cell states and interactions. However, CITE-seq data inherit the properties of scRNA-seq data, being noisy, large-dimensional, and highly sparse. Moreover, representations of RNA and surface protein are sometimes with low correlation and contribute divergently to the clustering object. To overcome these obstacles and find a combined representation well suited for clustering, we proposed scCTClust for multiomics data, especially CITE-seq data, and clustering analysis. Two omics-specific neural networks are introduced to extract cluster information from omics data. A deep canonical correlation method is adopted to find the maximumly correlated representations of two omics. A novel decentralized clustering method is utilized over the linear combination of latent representations of two omics. The fusion weights which can account for contributions of omics to clustering are adaptively updated during training. Extensive experiments over both simulated and real CITE-seq data sets demonstrated the power of scCTClust. We also applied scCTClust on transcriptome-epigenome data to illustrate its potential for generalizing.

6.
Bioinformatics ; 38(3): 738-745, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34623390

RESUMEN

MOTIVATION: Single-cell RNA-seq (scRNA-seq) has been widely used to resolve cellular heterogeneity. After collecting scRNA-seq data, the natural next step is to integrate the accumulated data to achieve a common ontology of cell types and states. Thus, an effective and efficient cell-type identification method is urgently needed. Meanwhile, high-quality reference data remain a necessity for precise annotation. However, such tailored reference data are always lacking in practice. To address this, we aggregated multiple datasets into a meta-dataset on which annotation is conducted. Existing supervised or semi-supervised annotation methods suffer from batch effects caused by different sequencing platforms, the effect of which increases in severity with multiple reference datasets. RESULTS: Herein, a robust deep learning-based single-cell Multiple Reference Annotator (scMRA) is introduced. In scMRA, a knowledge graph is constructed to represent the characteristics of cell types in different datasets, and a graphic convolutional network serves as a discriminator based on this graph. scMRA keeps intra-cell-type closeness and the relative position of cell types across datasets. scMRA is remarkably powerful at transferring knowledge from multiple reference datasets, to the unlabeled target domain, thereby gaining an advantage over other state-of-the-art annotation methods in multi-reference data experiments. Furthermore, scMRA can remove batch effects. To the best of our knowledge, this is the first attempt to use multiple insufficient reference datasets to annotate target data, and it is, comparatively, the best annotation method for multiple scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION: An implementation of scMRA is available from https://github.com/ddb-qiwang/scMRA-torch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Perfilación de la Expresión Génica , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA