Your browser doesn't support javascript.
loading
[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons]. / Effektivnost' i bezopasnost' bol'shikh yazykovykh modelei na osnove iskusstvennogo intellekta v kachestve instrumenta podderzhki prinyatiya reshenii v gerniologii: otsenka ekspertami i obshchimi khirurgami.
Nechay, T V; Sazhin, A V; Loban, K M; Bogomolova, A K; Suglob, V V; Beniia, T R.
Afiliación
  • Nechay TV; Pirogov Russian National Research Medical University, Moscow, Russia.
  • Sazhin AV; Pirogov Russian National Research Medical University, Moscow, Russia.
  • Loban KM; Pirogov Russian National Research Medical University, Moscow, Russia.
  • Bogomolova AK; Pirogov Russian National Research Medical University, Moscow, Russia.
  • Suglob VV; Pirogov Russian National Research Medical University, Moscow, Russia.
  • Beniia TR; Pirogov Russian National Research Medical University, Moscow, Russia.
Khirurgiia (Mosk) ; (8): 6-14, 2024.
Article en Ru | MEDLINE | ID: mdl-39140937
ABSTRACT

OBJECTIVE:

To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair. MATERIAL AND

METHODS:

ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.

RESULTS:

Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), p<0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.

CONCLUSION:

We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Inteligencia Artificial / Herniorrafia Límite: Humans Idioma: Ru Revista: Khirurgiia (Mosk) Año: 2024 Tipo del documento: Article País de afiliación: Rusia Pais de publicación: Rusia

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Inteligencia Artificial / Herniorrafia Límite: Humans Idioma: Ru Revista: Khirurgiia (Mosk) Año: 2024 Tipo del documento: Article País de afiliación: Rusia Pais de publicación: Rusia