A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education.

Gencer, Gülcan; Gencer, Kerem

ABSTRACT

Background This study aims to evaluate the performance of ChatGPT in the medical specialization exam (MSE) that medical graduates take when choosing their postgraduate specialization and to reveal how artificial intelligence-supported education can increase the quality and academic success of medical education. The research aims to explore the potential applications and advantages of artificial intelligence in medical education and examine ways in which this technology can contribute to student learning and exam preparation. Methodology A total of 240 MSE questions were posed to ChatGPT, 120 of which were basic medical sciences questions and 120 were clinical medical sciences questions. A total of 18,481 people participated in the exam. The performance of medical school graduates was compared with ChatGPT-3.5 in terms of answering these questions correctly. The average score for ChatGPT-3.5 was calculated by averaging the minimum and maximum scores. Calculations were done using the R.4.0.2 environment. Results The general average score of graduates was a minimum of 7.51 in basic sciences and a maximum of 81.46, while in clinical sciences, the average was a minimum of 12.51 and a maximum of 80.78. ChatGPT, on the other hand, had an average of at least 60.00 in basic sciences and a maximum of 72.00, with an average of at least 66.25 and a maximum of 77.00 in clinical sciences. The rate of correct answers in basic medical sciences for graduates was 43.03%, while for ChatGPT was 60.00%. In clinical medical sciences, the rate of correct answers for graduates was 53.29%, while for ChatGPT was 64.16%. ChatGPT performed best with a 91.66% correct answer rate in Obstetrics and Gynecology and an 86.36% correct answer rate in Medical Microbiology. The least successful area for ChatGPT was Anatomy, with a 28.00% correct answer rate, a subfield of basic medical sciences. Graduates outperformed ChatGPT in the Anatomy and Physiology subfields. Significant differences were found in all comparisons between ChatGPT and graduates. Conclusions This study shows that artificial intelligence models such as ChatGPT can provide significant advantages to graduates, as they score higher than medical school graduates. In terms of these benefits, recommended applications include interactive support, private lessons, learning material production, personalized learning plans, self-assessment, motivation boosting, and 24/7 access, among a variety of benefits. As a result, artificial intelligence-supported education can play an important role in improving the quality of medical education and increasing student success.