Structured information extraction from scientific text with large language models.
Nat Commun
; 15(1): 1418, 2024 Feb 15.
Article
en En
| MEDLINE
| ID: mdl-38360817
ABSTRACT
Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Idioma:
En
Revista:
Nat Commun
Asunto de la revista:
BIOLOGIA
/
CIENCIA
Año:
2024
Tipo del documento:
Article
País de afiliación:
Estados Unidos
Pais de publicación:
Reino Unido