Búsqueda | Portal Regional de la BVS

Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023.

Khalpey, Zain; Kumar, Ujjawal; King, Nicholas; Abraham, Alyssa; Khalpey, Amina H.

Cureus ; 16(7): e65083, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-39171020

RESUMEN

Objectives Large language models (LLMs), for example, ChatGPT, have performed exceptionally well in various fields. Of note, their success in answering postgraduate medical examination questions has been previously reported, indicating their possible utility in surgical education and training. This study evaluated the performance of four different LLMs on the American Board of Thoracic Surgery's (ABTS) Self-Education and Self-Assessment in Thoracic Surgery (SESATS) XIII question bank to investigate the potential applications of these LLMs in the education and training of future surgeons. Methods The dataset in this study comprised 400 best-of-four questions from the SESATS XIII exam. This included 220 adult cardiac surgery questions, 140 general thoracic surgery questions, 20 congenital cardiac surgery questions, and 20 cardiothoracic critical care questions. The GPT-3.5 (OpenAI, San Francisco, CA) and GPT-4 (OpenAI) models were evaluated, as well as Med-PaLM 2 (Google Inc., Mountain View, CA) and Claude 2 (Anthropic Inc., San Francisco, CA), and their respective performances were compared. The subspecialties included were adult cardiac, general thoracic, congenital cardiac, and critical care. Questions requiring visual information, such as clinical images or radiology, were excluded. Results GPT-4 demonstrated a significant improvement over GPT-3.5 overall (87.0% vs. 51.8% of questions answered correctly, p < 0.0001). GPT-4 also exhibited consistently improved performance across all subspecialties, with accuracy rates ranging from 70.0% to 90.0%, compared to 35.0% to 60.0% for GPT-3.5. When using the GPT-4 model, ChatGPT performed significantly better on the adult cardiac and general thoracic subspecialties (p < 0.0001). Conclusions Large language models, such as ChatGPT with the GPT-4 model, demonstrate impressive skill in understanding complex cardiothoracic surgical clinical information, achieving an overall accuracy rate of nearly 90.0% on the SESATS question bank. Our study shows significant improvement between successive GPT iterations. As LLM technology continues to evolve, its potential use in surgical education, training, and continuous medical education is anticipated to enhance patient outcomes and safety in the future.

Leveling Up: A Review of Machine Learning Models in the Cardiac ICU.

Khalpey, Zain; Wilson, Parker; Suri, Yash; Culbert, Hunter; Deckwa, Jessa; Khalpey, Amina; Rozell, Brynne.

Am J Med ; 136(10): 979-984, 2023 10.

Artículo en Inglés | MEDLINE | ID: mdl-37343909

RESUMEN

Machine learning has emerged as a significant tool to augment the medical decision-making process. Studies have steadily accrued detailing algorithms and models designed using machine learning to predict and anticipate pathologic states. The cardiac intensive care unit is an area where anticipation is crucial in the division between life and death. In this paper, we aim to review important studies describing the utility of machine learning algorithms to describe the future of artificial intelligence in the cardiac intensive care unit, especially in regards to the prediction of successful ventilatory weaning, acute respiratory distress syndrome, arrhythmia, and acute kidney injury.

Asunto(s)

Inteligencia Artificial , Aprendizaje Automático , Humanos , Unidades de Cuidados Intensivos , Algoritmos , Arritmias Cardíacas

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA