Búsqueda | Portal Regional de la BVS

MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering.

Alonso, Iñigo; Oronoz, Maite; Agerri, Rodrigo.

Artif Intell Med ; 155: 102938, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-39121544

RESUMEN

Large Language Models (LLMs) have the potential of facilitating the development of Artificial Intelligence technology to assist medical experts for interactive decision support. This potential has been illustrated by the state-of-the-art performance obtained by LLMs in Medical Question Answering, with striking results such as passing marks in licensing medical exams. However, while impressive, the required quality bar for medical applications remains far from being achieved. Currently, LLMs remain challenged by outdated knowledge and by their tendency to generate hallucinated content. Furthermore, most benchmarks to assess medical knowledge lack reference gold explanations which means that it is not possible to evaluate the reasoning of LLMs predictions. Finally, the situation is particularly grim if we consider benchmarking LLMs for languages other than English which remains, as far as we know, a totally neglected topic. In order to address these shortcomings, in this paper we present MedExpQA, the first multilingual benchmark based on medical exams to evaluate LLMs in Medical Question Answering. To the best of our knowledge, MedExpQA includes for the first time reference gold explanations, written by medical doctors, of the correct and incorrect options in the exams. Comprehensive multilingual experimentation using both the gold reference explanations and Retrieval Augmented Generation (RAG) approaches show that performance of LLMs, with best results around 75 accuracy for English, still has large room for improvement, especially for languages other than English, for which accuracy drops 10 points. Therefore, despite using state-of-the-art RAG methods, our results also demonstrate the difficulty of obtaining and integrating readily available medical knowledge that may positively impact results on downstream evaluations for Medical Question Answering. Data, code, and fine-tuned models will be made publicly available.1.

Asunto(s)

Inteligencia Artificial , Benchmarking , Multilingüismo , Humanos , Procesamiento de Lenguaje Natural

Cause of Death estimation from Verbal Autopsies: Is the Open Response redundant or synergistic?

Cejudo, Ander; Casillas, Arantza; Pérez, Alicia; Oronoz, Maite; Cobos, Daniel.

Artif Intell Med ; 143: 102622, 2023 09.

Artículo en Inglés | MEDLINE | ID: mdl-37673565

RESUMEN

Civil registration and vital statistics systems capture birth and death events to compile vital statistics and to provide legal rights to citizens. Vital statistics are a key factor in promoting public health policies and the health of the population. Medical certification of cause of death is the preferred source of cause of death information. However, two thirds of all deaths worldwide are not captured in routine mortality information systems and their cause of death is unknown. Verbal autopsy is an interim solution for estimating the cause of death distribution at the population level in the absence of medical certification. A Verbal Autopsy (VA) consists of an interview with the relative or the caregiver of the deceased. The VA includes both Closed Questions (CQs) with structured answer options, and an Open Response (OR) consisting of a free narrative of the events expressed in natural language and without any pre-determined structure. There are a number of automated systems to analyze the CQs to obtain cause specific mortality fractions with limited performance. We hypothesize that the incorporation of the text provided by the OR might convey relevant information to discern the CoD. The experimental layout compares existing Computer Coding Verbal Autopsy methods such as Tariff 2.0 with other approaches well suited to the processing of structured inputs as is the case of the CQs. Next, alternative approaches based on language models are employed to analyze the OR. Finally, we propose a new method with a bi-modal input that combines the CQs and the OR. Empirical results corroborated that the CoD prediction capability of the Tariff 2.0 algorithm is outperformed by our method taking into account the valuable information conveyed by the OR. As an added value, with this work we made available the software to enable the reproducibility of the results attained with a version implemented in R to make the comparison with Tariff 2.0 evident.

Asunto(s)

Algoritmos , Humanos , Autopsia , Causas de Muerte , Reproducibilidad de los Resultados

Neural negated entity recognition in Spanish electronic health records.

Santiso, Sara; Pérez, Alicia; Casillas, Arantza; Oronoz, Maite.

J Biomed Inform ; 105: 103419, 2020 05.

Artículo en Inglés | MEDLINE | ID: mdl-32298847

RESUMEN

This work deals with negation detection in the context of clinical texts. Negation detection is a key for decision support systems since negated events (detection of absence of some events) help ascertain current medical conditions. For artificial intelligence, negation detection is a valuable point as it can revert the meaning of a part of a text and, accordingly, influence other tasks such as medical dosage adjustment, the detection of adverse drug reactions or hospital acquired diseases. We focus on negated medical events such as disorders, findings and allergies. From Natural Language Processing (NLP) background, we refer to them as negated medical entities. A novelty of this work is that we approached this task as Named Entity Recognition (NER) with the restriction that just negated medical entities must be recognized (in an attempt to help distinguish them from non-negated ones). Our study is driven with Electronic Health Records (EHRs) written in Spanish. A challenge to cope with is the lexical variability (alternative medical forms, abbreviations, etc.). To this end, we employed an approach based on deep learning. Specifically, the system combines character embeddings to cope with out-of-vocabulary (OOV) words, Long Short-Term Memory (LSTM) networks to model contextual representations and it makes use of Conditional Random Fields (CRF) to classify each medical entity as either negated or not given the contextual dense representation. Moreover, we explored both embeddings created from words and embeddings created from lemmas. The best results were obtained with the lemmatized embeddings. Apparently, this approach reinforced the capability of the LSTMs to cope with the high lexical variability. The f-measure for exact-match was 65.1 and 82.4 for the partial-match.

Asunto(s)

Aprendizaje Profundo , Registros Electrónicos de Salud , Inteligencia Artificial , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación

Recent advances in Swedish and Spanish medical entity recognition in clinical texts using deep neural approaches.

Weegar, Rebecka; Pérez, Alicia; Casillas, Arantza; Oronoz, Maite.

BMC Med Inform Decis Mak ; 19(Suppl 7): 274, 2019 12 23.

Artículo en Inglés | MEDLINE | ID: mdl-31865900

RESUMEN

BACKGROUND: Text mining and natural language processing of clinical text, such as notes from electronic health records, requires specific consideration of the specialized characteristics of these texts. Deep learning methods could potentially mitigate domain specific challenges such as limited access to in-domain tools and data sets. METHODS: A bi-directional Long Short-Term Memory network is applied to clinical notes in Spanish and Swedish for the task of medical named entity recognition. Several types of embeddings, both generated from in-domain and out-of-domain text corpora, and a number of generation and combination strategies for embeddings have been evaluated in order to investigate different input representations and the influence of domain on the final results. RESULTS: For Spanish, a micro averaged F1-score of 75.25 was obtained and for Swedish, the corresponding score was 76.04. The best results for both languages were achieved using embeddings generated from in-domain corpora extracted from electronic health records, but embeddings generated from related domains were also found to be beneficial. CONCLUSIONS: A recurrent neural network with in-domain embeddings improved the medical named entity recognition compared to shallow learning methods, showing this combination to be suitable for entity recognition in clinical text for both languages.

Asunto(s)

Aprendizaje Profundo , Lenguaje , Procesamiento de Lenguaje Natural , Minería de Datos , Registros Electrónicos de Salud , Humanos , Redes Neurales de la Computación , Suecia

Interpretable deep learning to map diagnostic texts to ICD-10 codes.

Atutxa, Aitziber; de Ilarraza, Arantza Díaz; Gojenola, Koldo; Oronoz, Maite; Perez-de-Viñaspre, Olatz.

Int J Med Inform ; 129: 49-59, 2019 09.

Artículo en Inglés | MEDLINE | ID: mdl-31445289

RESUMEN

BACKGROUND: Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD). OBJECTIVE: Our aim is to propose a general and multilingual approach to render Diagnostic Terms into the standard framework provided by the ICD. We have evaluated our proposal on a set of clinical texts written in French, Hungarian and Italian. METHODS: ICD-10 encoding is a multi-class classification problem with an extensive (thousands) number of classes. After considering several approaches, we tackle our objective as a sequence-to-sequence task. According to current trends, we opted to use neural networks. We tested different types of neural architectures on three datasets in which Diagnostic Terms (DTs) have their ICD-10 codes associated. RESULTS AND CONCLUSIONS: Our results give a new state-of-the art on multilingual ICD-10 coding, outperforming several alternative approaches, and showing the feasibility of automatic ICD-10 prediction obtaining an F-measure of 0.838, 0.963 and 0.952 for French, Hungarian and Italian, respectively. Additionally, the results are interpretable, providing experts with supporting evidence when confronted with coding decisions, as the model is able to show the alignments between the original text and each output code.

Asunto(s)

Aprendizaje Profundo , Registros Electrónicos de Salud , Clasificación Internacional de Enfermedades , Redes Neurales de la Computación

Neural machine translation of clinical texts between long distance languages.

Soto, Xabier; Perez-de-Viñaspre, Olatz; Labaka, Gorka; Oronoz, Maite.

J Am Med Inform Assoc ; 26(12): 1478-1487, 2019 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-31334764

RESUMEN

OBJECTIVE: To analyze techniques for machine translation of electronic health records (EHRs) between long distance languages, using Basque and Spanish as a reference. We studied distinct configurations of neural machine translation systems and used different methods to overcome the lack of a bilingual corpus of clinical texts or health records in Basque and Spanish. MATERIALS AND METHODS: We trained recurrent neural networks on an out-of-domain corpus with different hyperparameter values. Subsequently, we used the optimal configuration to evaluate machine translation of EHR templates between Basque and Spanish, using manual translations of the Basque templates into Spanish as a standard. We successively added to the training corpus clinical resources, including a Spanish-Basque dictionary derived from resources built for the machine translation of the Spanish edition of SNOMED CT into Basque, artificial sentences in Spanish and Basque derived from frequently occurring relationships in SNOMED CT, and Spanish monolingual EHRs. Apart from calculating bilingual evaluation understudy (BLEU) values, we tested the performance in the clinical domain by human evaluation. RESULTS: We achieved slight improvements from our reference system by tuning some hyperparameters using an out-of-domain bilingual corpus, obtaining 10.67 BLEU points for Basque-to-Spanish clinical domain translation. The inclusion of clinical terminology in Spanish and Basque and the application of the back-translation technique on monolingual EHRs significantly improved the performance, obtaining 21.59 BLEU points. This was confirmed by the human evaluation performed by 2 clinicians, ranking our machine translations close to the human translations. DISCUSSION: We showed that, even after optimizing the hyperparameters out-of-domain, the inclusion of available resources from the clinical domain and applied methods were beneficial for the described objective, managing to obtain adequate translations of EHR templates. CONCLUSION: We have developed a system which is able to properly translate health record templates from Basque to Spanish without making use of any bilingual corpus of clinical texts or health records.

Asunto(s)

Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Traducción , Humanos , España , Systematized Nomenclature of Medicine , Terminología como Asunto

Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora.

Pérez, Alicia; Weegar, Rebecka; Casillas, Arantza; Gojenola, Koldo; Oronoz, Maite; Dalianis, Hercules.

J Biomed Inform ; 71: 16-30, 2017 07.

Artículo en Inglés | MEDLINE | ID: mdl-28526460

RESUMEN

OBJECTIVE: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS: The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. RESULTS: The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. CONCLUSIONS: The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.

Asunto(s)

Registros Electrónicos de Salud , Aprendizaje Automático , Semántica , Análisis por Conglomerados , Curaduría de Datos , Humanos , Suecia

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

Oronoz, Maite; Gojenola, Koldo; Pérez, Alicia; de Ilarraza, Arantza Díaz; Casillas, Arantza.

J Biomed Inform ; 56: 318-32, 2015 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-26141794

RESUMEN

The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Minería de Datos/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Registros Electrónicos de Salud/normas , Procesamiento de Lenguaje Natural , Algoritmos , Automatización , Lenguaje , Lingüística , Aprendizaje Automático , Preparaciones Farmacéuticas , Farmacovigilancia , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Traducción

SNOMED CT in a language isolate: an algorithm for a semiautomatic translation.

Perez-de-Viñaspre, Olatz; Oronoz, Maite.

BMC Med Inform Decis Mak ; 15 Suppl 2: S5, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26100112

RESUMEN

BACKGROUND: The Systematized Nomenclature of Medicine--Clinical Terms (SNOMED CT) is officially released in English and Spanish. In the Basque Autonomous Community two languages, Spanish and Basque, are official. The first attempt to semi-automatically translate the SNOMED CT terminology content to Basque, a less resourced language is presented in this paper. METHODS: A translation algorithm that has its basis in Natural Language Processing methods has been designed and partially implemented. The algorithm comprises four phases from which the first two have been implemented and quantitatively evaluated. RESULTS: Results are promising as we obtained the equivalents in Basque of 21.41% of the disorder terms of the English SNOMED CT release. As the methods developed are focused on that hierarchy, the results in other hierarchies are lower (12.57% for body structure descriptions, 8.80% for findings and 3% for procedures). CONCLUSIONS: We are in the way to reach two of our objectives when translating SNOMED CT to Basque: to use our language to access rich multilingual resources and to strengthen the use of the Basque language in the biomedical area.

Asunto(s)

Registros Electrónicos de Salud/organización & administración , Procesamiento de Lenguaje Natural , Systematized Nomenclature of Medicine , Traducción , Algoritmos , Automatización , Registros Electrónicos de Salud/normas , Humanos , Lenguaje

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA