RESUMEN
BACKGROUND: Real-time quantitative PCR (RT-qPCR) is one of the most widely used gene expression analyses for validating RNA-seq data. This technique requires reference genes that are stable and highly expressed, at least across the different biological conditions present in the transcriptome. Reference and variable candidate gene selection is often neglected, leading to misinterpretation of the results. RESULTS: We developed a software named "Gene Selector for Validation" (GSV), which identifies the best reference and variable candidate genes for validation within a quantitative transcriptome. This tool also filters the candidate genes concerning the RT-qPCR assay detection limit. GSV was compared with other software using synthetic datasets and performed better, removing stable low-expression genes from the reference candidate list and creating the variable-expression validation list. GSV software was used on a real case, an Aedes aegypti transcriptome. The top GSV reference candidate genes were selected for RT-qPCR analysis, confirming that eiF1A and eiF3j were the most stable genes tested. The tool also confirmed that traditional mosquito reference genes were less stable in the analyzed samples, highlighting the possibility of inappropriate choices. A meta-transcriptome dataset with more than ninety thousand genes was also processed successfully. CONCLUSION: The GSV tool is a time and cost-effective tool that can be used to select reference and validation candidate genes from the biological conditions present in transcriptomic data.
Asunto(s)
Reacción en Cadena en Tiempo Real de la Polimerasa , Estándares de Referencia , Programas Informáticos , Reacción en Cadena en Tiempo Real de la Polimerasa/métodos , Reacción en Cadena en Tiempo Real de la Polimerasa/normas , Animales , RNA-Seq/métodos , RNA-Seq/normas , Perfilación de la Expresión Génica/métodos , TranscriptomaRESUMEN
BACKGROUND: Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY: The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS: According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.