Structured information extraction from scientific text with large language models.

Dagdelen, John; Dunn, Alexander; Lee, Sanghoon; Walker, Nicholas; Rosen, Andrew S; Ceder, Gerbrand; Persson, Kristin A; Jain, Anubhav

Dagdelen, John; Dunn, Alexander; Lee, Sanghoon; Walker, Nicholas; Rosen, Andrew S; Ceder, Gerbrand; Persson, Kristin A; Jain, Anubhav.

Afiliación

Dagdelen J; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Dunn A; Materials Science and Engineering Department, University of California, Berkeley, CA, USA.
Lee S; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Walker N; Materials Science and Engineering Department, University of California, Berkeley, CA, USA.
Rosen AS; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Ceder G; Materials Science and Engineering Department, University of California, Berkeley, CA, USA.
Persson KA; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Jain A; Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Nat Commun ; 15(1): 1418, 2024 Feb 15.

Article en En | MEDLINE | ID: mdl-38360817

ABSTRACT

ABSTRACT

Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Nat Commun Asunto de la revista: BIOLOGIA / CIENCIA Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google