Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients.

Catala, Omar Del Tejo; Igual, Ismael Salvador; Perez-Benito, Francisco Javier; Escriva, David Millan; Castello, Vicent Ortiz; Llobet, Rafael; Perez-Cortes, Juan-Carlos

Catala, Omar Del Tejo; Igual, Ismael Salvador; Perez-Benito, Francisco Javier; Escriva, David Millan; Castello, Vicent Ortiz; Llobet, Rafael; Perez-Cortes, Juan-Carlos.

Afiliación

Catala ODT; Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain.
Igual IS; Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain.
Perez-Benito FJ; Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain.
Escriva DM; Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain.
Castello VO; Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain.
Llobet R; Instituto Tecnológico de Informática (ITI), Universitat Politècnica de València 46022 Valencia Spain.
Perez-Cortes JC; Department of Computer Systems and Computation (DSIC)Universitat Politècnica de València 46022 Valencia Spain.

IEEE Access ; 9: 42370-42383, 2021.

Article en En | MEDLINE | ID: mdl-34812384

RESUMEN

Chest X-ray images are useful for early COVID-19 diagnosis with the advantage that X-ray devices are already available in health centers and images are obtained immediately. Some datasets containing X-ray images with cases (pneumonia or COVID-19) and controls have been made available to develop machine-learning-based methods to aid in diagnosing the disease. However, these datasets are mainly composed of different sources coming from pre-COVID-19 datasets and COVID-19 datasets. Particularly, we have detected a significant bias in some of the released datasets used to train and test diagnostic systems, which might imply that the results published are optimistic and may overestimate the actual predictive capacity of the techniques proposed. In this article, we analyze the existing bias in some commonly used datasets and propose a series of preliminary steps to carry out before the classic machine learning pipeline in order to detect possible biases, to avoid them if possible and to report results that are more representative of the actual predictive power of the methods under analysis.

Palabras clave

COVID-19; Deep learning; bias; chest X-ray; convolutional neural networks; saliency map; segmentation

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: IEEE Access Año: 2021 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google