Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: a Prompt Engineering Study.

Coen, Emma; Del Fiol, Guilherme; Kaphingst, Kimberly A; Borsato, Emerson; Shannon, Jackie; Smith, Hadley Stevens; Masino, Aaron; Allen, Caitlin G

Coen, Emma; Del Fiol, Guilherme; Kaphingst, Kimberly A; Borsato, Emerson; Shannon, Jackie; Smith, Hadley Stevens; Masino, Aaron; Allen, Caitlin G.

Afiliación

Coen E; Clemson University.
Del Fiol G; University of Utah.
Kaphingst KA; University of Utah.
Borsato E; University of Utah.
Allen CG; Medical University of South Carolina.

Res Sq ; 2024 Aug 29.

Article en En | MEDLINE | ID: mdl-39257988

ABSTRACT

ABSTRACT

Background:

The growing demand for genomic testing and limited access to experts necessitate innovative service models. While chatbots have shown promise in supporting genomic services like pre-test counseling, their use in returning positive genetic results, especially using the more recent large language models (LLMs) remains unexplored.

Objective:

This study reports the prompt engineering process and intrinsic evaluation of the LLM component of a chatbot designed to support returning positive population-wide genomic screening results.

Methods:

We used a three-step prompt engineering process, including Retrieval-Augmented Generation (RAG) and few-shot techniques to develop an open-response chatbot. This was then evaluated using two hypothetical scenarios, with experts rating its performance using a 5-point Likert scale across eight criteria tone, clarity, program accuracy, domain accuracy, robustness, efficiency, boundaries, and usability.

Results:

The chatbot achieved an overall score of 3.88 out of 5 across all criteria and scenarios. The highest ratings were in Tone (4.25), Usability (4.25), and Boundary management (4.0), followed by Efficiency (3.88), Clarity and Robustness (3.81), and Domain Accuracy (3.63). The lowest-rated criterion was Program Accuracy, which scored 3.25.

Discussion:

The LLM handled open-ended queries and maintained boundaries, while the lower Program Accuracy rating indicates areas for improvement. Future work will focus on refining prompts, expanding evaluations, and exploring optimal hybrid chatbot designs that integrate LLM components with rule-based chatbot components to enhance genomic service delivery.

Palabras clave

Few-Shot Learning; Population Screening Program; Prompt Engineering; Retrieval-Augmented Generation (RAG)

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Res Sq Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google