Improving the performance of machine learning algorithms for health outcomes predictions in multicentric cohorts.

Wichmann, Roberta Moreira; Fernandes, Fernando Timoteo; Chiavegatto Filho, Alexandre Dias Porto

Wichmann, Roberta Moreira; Fernandes, Fernando Timoteo; Chiavegatto Filho, Alexandre Dias Porto.

Afiliación

Wichmann RM; School of Public Health, University of São Paulo, São Paulo, SP, Brazil. roberta.wichmann@idp.edu.br.
Fernandes FT; Brazilian Institute of Education, Development and Research-IDP, Economics Graduate Program, Brasilia, DF, Brazil. roberta.wichmann@idp.edu.br.
Chiavegatto Filho ADP; School of Public Health, University of São Paulo, São Paulo, SP, Brazil.

Sci Rep ; 13(1): 1022, 2023 01 19.

Article en En | MEDLINE | ID: mdl-36658181

RESUMEN

Machine learning algorithms are being increasingly used in healthcare settings but their generalizability between different regions is still unknown. This study aims to identify the strategy that maximizes the predictive performance of identifying the risk of death by COVID-19 in different regions of a large and unequal country. This is a multicenter cohort study with data collected from patients with a positive RT-PCR test for COVID-19 from March to August 2020 (n = 8477) in 18 hospitals, covering all five Brazilian regions. Of all patients with a positive RT-PCR test during the period, 2356 (28%) died. Eight different strategies were used for training and evaluating the performance of three popular machine learning algorithms (extreme gradient boosting, lightGBM, and catboost). The strategies ranged from only using training data from a single hospital, up to aggregating patients by their geographic regions. The predictive performance of the algorithms was evaluated by the area under the ROC curve (AUROC) on the test set of each hospital. We found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals. In this study, the use of more patient data from other regions slightly decreased predictive performance. However, models trained in other hospitals still had acceptable performances and could be a solution while data for a specific hospital is being collected.

Asunto(s)

COVID-19; Humanos; COVID-19/diagnóstico; COVID-19/epidemiología; Estudios de Cohortes; Algoritmos; Aprendizaje Automático; Evaluación de Resultado en la Atención de Salud; Estudios Retrospectivos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: COVID-19 Tipo de estudio: Diagnostic_studies / Etiology_studies / Incidence_studies / Observational_studies / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Sci Rep Año: 2023 Tipo del documento: Article País de afiliación: Brasil Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google