A high-precision genome size estimator based on the k-mer histogram correction.
Front Genet
; 15: 1451730, 2024.
Article
en En
| MEDLINE
| ID: mdl-39238787
ABSTRACT
Introduction:
In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge.Methods:
In this study, we introduce a high-precision genome size estimator, GSET (Genome Size Estimation Tool), which is based on k-mer histogram correction.Results:
We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results.Discussion:
The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. GSET is freely available for use and can be accessed at the following URL https//github.com/Xingyu-Liao/GSET.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
Front Genet
Año:
2024
Tipo del documento:
Article
País de afiliación:
China
Pais de publicación:
Suiza