Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research.

Masoumi, Safoora; Amirkhani, Hossein; Sadeghian, Najmeh; Shahraz, Saeid

Masoumi, Safoora; Amirkhani, Hossein; Sadeghian, Najmeh; Shahraz, Saeid.

Afiliación

Masoumi S; Pediatric Infectious Diseases Research Center, Mazandaran University of Medical Sciences, Sari, Iran. Safoora@unc.edu.
Amirkhani H; Computer and Information Technology Department, University of Qom, Qom, Iran.
Sadeghian N; Student Research Committee, Mazandaran University of Medical Sciences, Sari, Iran.
Shahraz S; Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA.

Syst Rev ; 13(1): 107, 2024 Apr 15.

Article en En | MEDLINE | ID: mdl-38622611

ABSTRACT

ABSTRACT

BACKGROUND:

Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this automated abstract review model.

METHODS:

Scanning PubMed, Embase, PsycINFO, and CINAHL databases, we identified 22,294 with a final selection of 12,817 English abstracts published between 2000 and 2021. We invented a manual classification of medical fields, three variables, i.e., the context of use (COU), text source (TS), and primary research field (PRF). A training dataset was developed after reviewing 485 abstracts. We used a language model called Bidirectional Encoder Representations from Transformers to classify the abstracts. To evaluate the performance of the trained models, we report a micro f1-score and accuracy.

RESULTS:

The trained models' micro f1-score for classifying abstracts, into three variables were 77.35% for COU, 76.24% for TS, and 85.64% for PRF. The average annual growth rate (AAGR) of the publications was 20.99% between 2000 and 2020 (72.01 articles (95% CI 56.80-78.30) yearly increase), with 81.76% of the abstracts published between 2010 and 2020. Studies on neoplasms constituted 27.66% of the entire corpus with an AAGR of 42.41%, followed by studies on mental conditions (AAGR = 39.28%). While electronic health or medical records comprised the highest proportion of text sources (57.12%), omics databases had the highest growth among all text sources with an AAGR of 65.08%. The most common NLP application was clinical decision support (25.45%).

CONCLUSIONS:

BioBERT showed an acceptable performance in the abstract review. If future research shows the high performance of this language model, it can reliably replace manual abstract reviews.

Asunto(s)

Investigación Biomédica; Procesamiento de Lenguaje Natural; Humanos; Lenguaje; Minería de Datos; Registros Electrónicos de Salud

Palabras clave

BioBERT; Machine learning; Medicine; Natural language processing (NLP); Trend analysis

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Investigación Biomédica Límite: Humans Idioma: En Revista: Syst Rev Año: 2024 Tipo del documento: Article País de afiliación: Irán Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google