Your browser doesn't support javascript.
loading
Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.
Jaworski, Aleksander; Jasinski, Dawid; Jaworski, Wojciech; Hop, Aleksandra; Janek, Artur; Slawinska, Barbara; Konieczniak, Lena; Rzepka, Maciej; Jung, Maximilian; Syslo, Oliwia; Jarzabek, Victoria; Blecha, Zuzanna; Harazinski, Konrad; Jasinska, Natalia.
Afiliación
  • Jaworski A; Department of Medicine, Specialist Medical Centre Joint Stock Company, Polanica-Zdrój, POL.
  • Jasinski D; Department of Medicine, Prof. K. Gibinski University Clinical Center of the Medical University of Silesia in Katowice, Katowice, POL.
  • Jaworski W; Department of Children's Developmental Defects Surgery and Traumatology, Medical University of Silesia in Katowice, Katowice, POL.
  • Hop A; Department of Medicine, Fryderyk Chopin University Clinical Hospital in Rzeszów, Rzeszów, POL.
  • Janek A; Department of Medicine, Prof. K. Gibinski University Clinical Center of the Medical University of Silesia in Katowice, Katowice, POL.
  • Slawinska B; Department of Medicine, Medical University of Silesia in Katowice, Katowice, POL.
  • Konieczniak L; Department of Medicine, Regional Specialised Hospital No. 4 in Bytom, Bytom, POL.
  • Rzepka M; Department of Medicine, St. Barbara Specialised Regional Hospital No. 5, Sosnowiec, POL.
  • Jung M; Department of Medicine, University Clinical Hospital in Opole, Opole, POL.
  • Syslo O; Department of Medicine, Academy of Silesia, Katowice, POL.
  • Jarzabek V; Department of Medicine, Regional Specialised Hospital No. 4 in Bytom, Bytom, POL.
  • Blecha Z; Department of Medicine, Medical University of Silesia in Katowice, Katowice, POL.
  • Harazinski K; Department of Medicine, Medical University of Silesia in Katowice, Katowice, POL.
  • Jasinska N; Department of Cybernetics, Military University of Technology, Warsaw, POL.
Cureus ; 16(8): e66011, 2024 Aug.
Article en En | MEDLINE | ID: mdl-39221376
ABSTRACT

BACKGROUND:

The rapid development of artificial intelligence (AI) technologies like OpenAI's Generative Pretrained Transformer (GPT), particularly ChatGPT, has shown promising applications in various fields, including medicine. This study evaluates ChatGPT's performance on the Polish Final Medical Examination (LEK), comparing its efficacy to that of human test-takers.

METHODS:

The study analyzed ChatGPT's ability to answer 196 multiple-choice questions from the spring 2021 LEK. Questions were categorized into "clinical cases" and "other" general medical knowledge, and then divided according to medical fields. Two versions of ChatGPT (3.5 and 4.0) were tested. Statistical analyses, including Pearson's χ2 test, and Mann-Whitney U test, were conducted to compare the AI's performance and confidence levels.

RESULTS:

ChatGPT 3.5 correctly answered 50.51% of the questions, while ChatGPT 4.0 answered 77.55% correctly, surpassing the 56% passing threshold. Version 3.5 showed significantly higher confidence in correct answers, whereas version 4.0 maintained consistent confidence regardless of answer accuracy. No significant differences in performance were observed across different medical fields.

CONCLUSIONS:

ChatGPT 4.0 demonstrated the ability to pass the LEK, indicating substantial potential for AI in medical education and assessment. Future improvements in AI models, such as the anticipated ChatGPT 5.0, may enhance further performance, potentially equaling or surpassing human test-takers.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Cureus Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Cureus Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos