Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets.

Cuevas-Diaz Duran, Raquel; Wei, Haichao; Wu, Jiaqian

Cuevas-Diaz Duran, Raquel; Wei, Haichao; Wu, Jiaqian.

Afiliação

Cuevas-Diaz Duran R; Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Nuevo Leon, 64710, Mexico. raquel.cuevas.dd@tec.mx.
Wei H; The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
Wu J; Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, 77030, USA.

BMC Genomics ; 25(1): 444, 2024 May 06.

Article em En | MEDLINE | ID: mdl-38711017

ABSTRACT

ABSTRACT

BACKGROUND:

Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis.

CONCLUSIONS:

According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.

Assuntos

Análise de Célula Única; Animais; Humanos; Algoritmos; Perfilação da Expressão Gênica/métodos; Perfilação da Expressão Gênica/normas; RNA-Seq/métodos; RNA-Seq/normas; Análise de Sequência de RNA/métodos; Análise de Célula Única/métodos; Transcriptoma; Conjuntos de Dados como Assunto

Palavras-chave

Biological variability; Normalization; Single-cell sequencing; Technical variability; scRNA-seq

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise de Célula Única Limite: Animals / Humans Idioma: En Revista: BMC Genomics Assunto da revista: GENETICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: México País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google