Búsqueda | Portal Regional de la BVS

Ten quick tips for ensuring machine learning model validity.

Goh, Wilson Wen Bin; Kabir, Mohammad Neamul; Yoo, Sehwan; Wong, Limsoon.

PLoS Comput Biol ; 20(9): e1012402, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-39298376

RESUMEN

Artificial Intelligence (AI) and Machine Learning (ML) models are increasingly deployed on biomedical and health data to shed insights on biological mechanism, predict disease outcomes, and support clinical decision-making. However, ensuring model validity is challenging. The 10 quick tips described here discuss useful practices on how to check AI/ML models from 2 perspectives-the user and the developer.

Asunto(s)

Biología Computacional , Aprendizaje Automático , Humanos , Biología Computacional/métodos , Inteligencia Artificial , Reproducibilidad de los Resultados , Algoritmos

EnsembleFam: towards more accurate protein family prediction in the twilight zone.

Kabir, Mohammad Neamul; Wong, Limsoon.

BMC Bioinformatics ; 23(1): 90, 2022 Mar 14.

Artículo en Inglés | MEDLINE | ID: mdl-35287576

RESUMEN

BACKGROUND: Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. RESULTS: We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. CONCLUSIONS: EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted using just sequence information with better accuracy than state-of-the-art methods.

Asunto(s)

Proteínas , Máquina de Vectores de Soporte , Humanos , Proteínas/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA