Búsqueda | Portal Regional de la BVS

DTDHM: detection of tandem duplications based on hybrid methods using next-generation sequencing data.

Yuan, Tianting; Dong, Jinxin; Jia, Baoxian; Jiang, Hua; Zhao, Zuyao; Zhou, Mengjiao.

PeerJ ; 12: e17748, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39076774

RESUMEN

Background: Tandem duplication (TD) is a common and important type of structural variation in the human genome. TDs have been shown to play an essential role in many diseases, including cancer. However, it is difficult to accurately detect TDs due to the uneven distribution of reads and the inherent complexity of next-generation sequencing (NGS) data. Methods: This article proposes a method called DTDHM (detection of tandem duplications based on hybrid methods), which utilizes NGS data to detect TDs in a single sample. DTDHM builds a pipeline that integrates read depth (RD), split read (SR), and paired-end mapping (PEM) signals. To solve the problem of uneven distribution of normal and abnormal samples, DTDHM uses the K-nearest neighbor (KNN) algorithm for multi-feature classification prediction. Then, the qualified split reads and discordant reads are extracted and analyzed to achieve accurate localization of variation sites. This article compares DTDHM with three other methods on 450 simulated datasets and five real datasets. Results: In 450 simulated data samples, DTDHM consistently maintained the highest F1-score. The average F1-score of DTDHM, SVIM, TARDIS, and TIDDIT were 80.0%, 56.2%, 43.4%, and 67.1%, respectively. The F1-score of DTDHM had a small variation range and its detection effect was the most stable and 1.2 times that of the suboptimal method. Most of the boundary biases of DTDHM fluctuated around 20 bp, and its boundary deviation detection ability was better than TARDIS and TIDDIT. In real data experiments, five real sequencing samples (NA19238, NA19239, NA19240, HG00266, and NA12891) were used to test DTDHM. The results showed that DTDHM had the highest overlap density score (ODS) and F1-score of the four methods. Conclusions: Compared with the other three methods, DTDHM achieved excellent results in terms of sensitivity, precision, F1-score, and boundary bias. These results indicate that DTDHM can be used as a reliable tool for detecting TDs from NGS data, especially in the case of low coverage depth and tumor purity samples.

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Genoma Humano/genética , Secuencias Repetidas en Tándem/genética

CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data.

Zhang, Tong; Dong, Jinxin; Jiang, Hua; Zhao, Zuyao; Zhou, Mengjiao; Yuan, Tianting.

Front Bioeng Biotechnol ; 10: 1000638, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36532569

RESUMEN

Copy number variations (CNVs) significantly influence the diversity of the human genome and the occurrence of many complex diseases. The next-generation sequencing (NGS) technology provides rich data for detecting CNVs, and the read depth (RD)-based approach is widely used. However, low CN (copy number of 3-4) duplication events are challenging to identify with existing methods, especially when the size of CNVs is small. In addition, the RD-based approach can only obtain rough breakpoints. We propose a new method, CNV-PCC (detection of CNVs based on Principal Component Classifier), to identify CNVs in whole genome sequencing data. CNV-PPC first uses the split read signal to search for potential breakpoints. A two-stage segmentation strategy is then implemented to enhance the identification capabilities of low CN duplications and small CNVs. Next, the outlier scores are calculated for each segment by PCC (Principal Component Classifier). Finally, the OTSU algorithm calculates the threshold to determine the CNVs regions. The analysis of simulated data results indicates that CNV-PCC outperforms the other methods for sensitivity and F1-score and improves breakpoint accuracy. Furthermore, CNV-PCC shows high consistency on real sequencing samples with other methods. This study demonstrates that CNV-PCC is an effective method for detecting CNVs, even for low CN duplications and small CNVs.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA