[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons].

Nechay, T V; Sazhin, A V; Loban, K M; Bogomolova, A K; Suglob, V V; Beniia, T R

[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons]. / Effektivnost' i bezopasnost' bol'shikh yazykovykh modelei na osnove iskusstvennogo intellekta v kachestve instrumenta podderzhki prinyatiya reshenii v gerniologii: otsenka ekspertami i obshchimi khirurgami.

Nechay, T V; Sazhin, A V; Loban, K M; Bogomolova, A K; Suglob, V V; Beniia, T R.

Afiliación

Nechay TV; Pirogov Russian National Research Medical University, Moscow, Russia.
Sazhin AV; Pirogov Russian National Research Medical University, Moscow, Russia.
Loban KM; Pirogov Russian National Research Medical University, Moscow, Russia.
Bogomolova AK; Pirogov Russian National Research Medical University, Moscow, Russia.
Suglob VV; Pirogov Russian National Research Medical University, Moscow, Russia.
Beniia TR; Pirogov Russian National Research Medical University, Moscow, Russia.

Khirurgiia (Mosk) ; (8): 6-14, 2024.

Article en Ru | MEDLINE | ID: mdl-39140937

ABSTRACT

ABSTRACT

OBJECTIVE:

To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair. MATERIAL AND

METHODS:

ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.

RESULTS:

Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), p<0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.

CONCLUSION:

We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.

Asunto(s)

Inteligencia Artificial; Herniorrafia; Humanos; Herniorrafia/métodos; Cirujanos; Hernia Inguinal/cirugía; Toma de Decisiones Clínicas/métodos; Sistemas de Apoyo a Decisiones Clínicas

Palabras clave

ChatGPT; artificial intelligence; clinical decision making support tool; evidence level; guidelines; inguinal hernia; large language model

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Inteligencia Artificial / Herniorrafia Límite: Humans Idioma: Ru Revista: Khirurgiia (Mosk) Año: 2024 Tipo del documento: Article País de afiliación: Rusia Pais de publicación: Rusia

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google