Your browser doesn't support javascript.
loading
Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.
Durmaz Engin, Ceren; Karatas, Ezgi; Ozturk, Taylan.
Afiliación
  • Durmaz Engin C; Department of Ophthalmology, Izmir Democracy University, Buca Seyfi Demirsoy Education and Research Hospital, Izmir 35390, Turkey.
  • Karatas E; Department of Biomedical Technologies, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey.
  • Ozturk T; Department of Ophthalmology, Agri Ibrahim Cecen University, Agri 04200, Turkey.
Children (Basel) ; 11(6)2024 Jun 20.
Article en En | MEDLINE | ID: mdl-38929329
ABSTRACT

BACKGROUND:

Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP).

METHODS:

The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models' responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index.

RESULTS:

ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of "agreed" or "strongly agreed" in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories.

CONCLUSION:

ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Children (Basel) Año: 2024 Tipo del documento: Article País de afiliación: Turquía Pais de publicación: Suiza

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Children (Basel) Año: 2024 Tipo del documento: Article País de afiliación: Turquía Pais de publicación: Suiza