Your browser doesn't support javascript.
loading
Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare.
Vallevik, Vibeke Binz; Babic, Aleksandar; Marshall, Serena E; Elvatun, Severin; Brøgger, Helga M B; Alagaratnam, Sharmini; Edwin, Bjørn; Veeraragavan, Narasimha R; Befring, Anne Kjersti; Nygård, Jan F.
Afiliación
  • Vallevik VB; University of Oslo, Boks 1072 Blindern, NO-0316 Oslo, Norway; DNV AS, Veritasveien 1, 1322 Høvik, Norway. Electronic address: vibeke.binz.vallevik@dnv.com.
  • Babic A; DNV AS, Veritasveien 1, 1322 Høvik, Norway.
  • Marshall SE; DNV AS, Veritasveien 1, 1322 Høvik, Norway.
  • Elvatun S; Cancer Registry of Norway, Ullernchausseen 64, 0379 Oslo, Norway.
  • Brøgger HMB; DNV AS, Veritasveien 1, 1322 Høvik, Norway; Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway.
  • Alagaratnam S; DNV AS, Veritasveien 1, 1322 Høvik, Norway.
  • Edwin B; University of Oslo, Boks 1072 Blindern, NO-0316 Oslo, Norway; The Intervention Centre and Department of HPB Surgery, Oslo University Hospital and Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway.
  • Veeraragavan NR; Cancer Registry of Norway, Ullernchausseen 64, 0379 Oslo, Norway.
  • Befring AK; University of Oslo, Boks 1072 Blindern, NO-0316 Oslo, Norway.
  • Nygård JF; Cancer Registry of Norway, Ullernchausseen 64, 0379 Oslo, Norway; UiT - The Arctic University of Norway, Tromsø, Norway.
Int J Med Inform ; 185: 105413, 2024 May.
Article en En | MEDLINE | ID: mdl-38493547
ABSTRACT

BACKGROUND:

Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. Synthetic data has been suggested in response to privacy concerns and regulatory requirements and can be created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been proposed, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks.

METHOD:

We performed a comprehensive literature review on the use of quality evaluation metrics on synthetic data within the scope of synthetic tabular healthcare data using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry.

CONCLUSION:

We present a conceptual framework for quality assuranceof synthetic data for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients.

DISCUSSION:

Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of synthetic data. As the choice of appropriate metrics are highly context dependent, further research is needed on validation studies to guide metric choices and support the development of technical standards.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Atención a la Salud / Confianza Límite: Humans Idioma: En Revista: Int J Med Inform Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article Pais de publicación: Irlanda

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Atención a la Salud / Confianza Límite: Humans Idioma: En Revista: Int J Med Inform Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article Pais de publicación: Irlanda