OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models.

Maharjan, Jenish; Garikipati, Anurag; Singh, Navan Preet; Cyrus, Leo; Sharma, Mayank; Ciobanu, Madalina; Barnes, Gina; Thapa, Rahul; Mao, Qingqing; Das, Ritankar

Maharjan, Jenish; Garikipati, Anurag; Singh, Navan Preet; Cyrus, Leo; Sharma, Mayank; Ciobanu, Madalina; Barnes, Gina; Thapa, Rahul; Mao, Qingqing; Das, Ritankar.

Afiliación

Maharjan J; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Garikipati A; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Singh NP; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Cyrus L; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Sharma M; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Ciobanu M; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Barnes G; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Thapa R; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
Mao Q; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA. qmao@fortahealth.com.
Das R; Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.

Sci Rep ; 14(1): 14156, 2024 06 19.

Article en En | MEDLINE | ID: mdl-38898116

ABSTRACT

ABSTRACT

LLMs can accomplish specialized medical knowledge tasks, however, equitable access is hindered by the extensive fine-tuning, specialized medical data requirement, and limited access to proprietary models. Open-source (OS) medical LLMs show performance improvements and provide the transparency and compliance required in healthcare. We present OpenMedLM, a prompting platform delivering state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks. We evaluated OS foundation LLMs (7B-70B) on medical benchmarks (MedQA, MedMCQA, PubMedQA, MMLU medical-subset) and selected Yi34B for developing OpenMedLM. Prompting strategies included zero-shot, few-shot, chain-of-thought, and ensemble/self-consistency voting. OpenMedLM delivered OS SOTA results on three medical LLM benchmarks, surpassing previous best-performing OS models that leveraged costly and extensive fine-tuning. OpenMedLM displays the first results to date demonstrating the ability of OS foundation models to optimize performance, absent specialized fine-tuning. The model achieved 72.6% accuracy on MedQA, outperforming the previous SOTA by 2.4%, and 81.7% accuracy on MMLU medical-subset, establishing itself as the first OS LLM to surpass 80% accuracy on this benchmark. Our results highlight medical-specific emergent properties in OS LLMs not documented elsewhere to date and validate the ability of OS models to accomplish healthcare tasks, highlighting the benefits of prompt engineering to improve performance of accessible LLMs for medical applications.

Asunto(s)

Benchmarking; Humanos; Programas Informáticos

Palabras clave

Artificial intelligence; Clinical decision support; Large language models; Open-source

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Benchmarking Límite: Humans Idioma: En Revista: Sci Rep Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google