iDHS-RGME: Identification of DNase I hypersensitive sites by integrating information on nucleotide composition and physicochemical properties.
Biochem Biophys Res Commun
; 734: 150618, 2024 Aug 29.
Article
en En
| MEDLINE
| ID: mdl-39222575
ABSTRACT
As pivotal markers of chromatin accessibility, DNase I hypersensitive sites (DHSs) intimately link to fundamental biological processes encompassing gene expression regulation and disease pathogenesis. Developing efficient and precise algorithms for DHSs identification holds paramount importance for unraveling genome functionality and elucidating disease mechanisms. This study innovatively presents iDHS-RGME, an Extremely Randomized Trees (Extra-Trees)-based algorithm that integrates unique feature extraction techniques for enhanced DHSs prediction. Specifically, iDHS-RGME utilizes two feature extraction approaches Reverse Complementary Kmer (RCKmer) and Geary Spatial Autocorrelation (GSA), which comprehensively capture sequence attributes from diverse angles, bolstering information richness and accuracy. To address data imbalance, Borderline-SMOTE is employed, followed by Maximum Information Coefficient (MIC) for meticulous feature selection. Comparative evaluations underscored the superiority of the Extra-Trees classifier, which was subsequently adopted for model prediction. Through rigorous five-fold cross-validation, iDHS-RGME achieved remarkable accuracies of 94.71 % and 95.07 % on two independent datasets, outperforming previous models in terms of both precision and effectiveness.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
Biochem Biophys Res Commun
Año:
2024
Tipo del documento:
Article
País de afiliación:
China
Pais de publicación:
Estados Unidos