Búsqueda | Portal Regional de la BVS

An efficient prototype method to identify and correct misspellings in clinical text.

Workman, T Elizabeth; Shao, Yijun; Divita, Guy; Zeng-Treitler, Qing.

BMC Res Notes ; 12(1): 42, 2019 Jan 18.

Artículo en Inglés | MEDLINE | ID: mdl-30658682

RESUMEN

OBJECTIVE: Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. RESULTS: In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.

Asunto(s)

Diccionarios como Asunto , Informática Médica/métodos , Procesamiento de Lenguaje Natural , Vocabulario Controlado , Algoritmos , Humanos , Lenguaje , Informática Médica/normas , Informática Médica/estadística & datos numéricos , Sistemas de Registros Médicos Computarizados/normas , Sistemas de Registros Médicos Computarizados/estadística & datos numéricos , Patología Quirúrgica/métodos , Reproducibilidad de los Resultados , Informe de Investigación/normas , Unified Medical Language System/normas , Unified Medical Language System/estadística & datos numéricos

CodeMapper: semiautomatic coding of case definitions. A contribution from the ADVANCE project.

Becker, Benedikt F H; Avillach, Paul; Romio, Silvana; van Mulligen, Erik M; Weibel, Daniel; Sturkenboom, Miriam C J M; Kors, Jan A.

Pharmacoepidemiol Drug Saf ; 26(8): 998-1005, 2017 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-28657162

RESUMEN

BACKGROUND: Assessment of drug and vaccine effects by combining information from different healthcare databases in the European Union requires extensive efforts in the harmonization of codes as different vocabularies are being used across countries. In this paper, we present a web application called CodeMapper, which assists in the mapping of case definitions to codes from different vocabularies, while keeping a transparent record of the complete mapping process. METHODS: CodeMapper builds upon coding vocabularies contained in the Metathesaurus of the Unified Medical Language System. The mapping approach consists of three phases. First, medical concepts are automatically identified in a free-text case definition. Second, the user revises the set of medical concepts by adding or removing concepts, or expanding them to related concepts that are more general or more specific. Finally, the selected concepts are projected to codes from the targeted coding vocabularies. We evaluated the application by comparing codes that were automatically generated from case definitions by applying CodeMapper's concept identification and successive concept expansion, with reference codes that were manually created in a previous epidemiological study. RESULTS: Automated concept identification alone had a sensitivity of 0.246 and positive predictive value (PPV) of 0.420 for reproducing the reference codes. Three successive steps of concept expansion increased sensitivity to 0.953 and PPV to 0.616. CONCLUSIONS: Automatic concept identification in the case definition alone was insufficient to reproduce the reference codes, but CodeMapper's operations for concept expansion provide an effective, efficient, and transparent way for reproducing the reference codes.

Asunto(s)

Bases de Datos Factuales/estadística & datos numéricos , Clasificación Internacional de Enfermedades/estadística & datos numéricos , Sistemas de Registros Médicos Computarizados/estadística & datos numéricos , Unified Medical Language System/estadística & datos numéricos , Europa (Continente)/epidemiología , Humanos

Analysis of a study of the users, uses, and future agenda of the UMLS.

Chen, Yan; Perl, Yehoshua; Geller, James; Cimino, James J.

J Am Med Inform Assoc ; 14(2): 221-31, 2007.

Artículo en Inglés | MEDLINE | ID: mdl-17213497

RESUMEN

OBJECTIVES: The UMLS constitutes the largest existing collection of medical terms. However, little has been published about the users and uses of the UMLS. This study sheds light on these issues. DESIGN: We designed a questionnaire consisting of 26 questions and distributed it to the UMLS user mailing list. Participants were assured complete confidentiality of their replies. To further encourage list members to respond, we promised to provide them with early results prior to publication. Sector analysis of the responses, according to employment organizations is used to obtain insights into some responses. RESULTS: We received 70 responses. The study confirms two intended uses of the UMLS: access to source terminologies (75%), and mapping among them (44%). However, most access is just to a few sources, led by SNOMED, MeSH, and ICD. Out of 119 reported purposes of use, terminology research (37), information retrieval (19), and terminology translation (14) lead. Four important observations are that the UMLS is widely used as a terminology (77%), even though it was not designed as one; many users (73%) want the NLM to mark concepts with multiple parents in an indented hierarchy and to derive a terminology from the UMLS (73%). Finally, auditing the UMLS is a top budget priority (35%) for users. CONCLUSIONS: The study reports many uses of the UMLS in a variety of subjects from terminology research to decision support and phenotyping. The study confirms that the UMLS is used to access its source terminologies and to map among them. Two primary concerns of the existing user base are auditing the UMLS and the design of a UMLS-based derived terminology.

Asunto(s)

Unified Medical Language System/estadística & datos numéricos , Vocabulario Controlado , Comportamiento del Consumidor , Recolección de Datos , Técnicas de Apoyo para la Decisión , Humanos , Almacenamiento y Recuperación de la Información , Investigación , Encuestas y Cuestionarios , Unified Medical Language System/economía , Unified Medical Language System/tendencias

Who is using the UMLS and how - insights from the UMLS user annual reports.

Fung, Kin Wah; Hole, William T; Srinivasan, Suresh.

AMIA Annu Symp Proc ; : 274-8, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-17238346

RESUMEN

The NLM's UMLS resources are available to users free of charge under a license that requires submission of an annual report on their usage. A new web-based template was used to collect users' annual reports for the calendar year 2004. Out of 2,677 li-censees, 1,427 (53%) submitted their annual reports through the web template. This represented a five-fold increase in the reports submitted compared to previous years. The information collected via the web template was more structured, more complete and easier to analyze. The main results from the 2004 annual reports are summarized and discussed. They are being used to guide UMLS developments.

Asunto(s)

Unified Medical Language System/estadística & datos numéricos , Recolección de Datos/métodos , Internet , Encuestas y Cuestionarios , Unified Medical Language System/tendencias , Vocabulario Controlado

A tool for sharing annotated research data: the "Category 0" UMLS (Unified Medical Language System) vocabularies.

Berman, Jules J.

BMC Med Inform Decis Mak ; 3: 6, 2003 Jun 16.

Artículo en Inglés | MEDLINE | ID: mdl-12809560

RESUMEN

BACKGROUND: Large biomedical data sets have become increasingly important resources for medical researchers. Modern biomedical data sets are annotated with standard terms to describe the data and to support data linking between databases. The largest curated listing of biomedical terms is the the National Library of Medicine's Unified Medical Language System (UMLS). The UMLS contains more than 2 million biomedical terms collected from nearly 100 medical vocabularies. Many of the vocabularies contained in the UMLS carry restrictions on their use, making it impossible to share or distribute UMLS-annotated research data. However, a subset of the UMLS vocabularies, designated Category 0 by UMLS, can be used to annotate and share data sets without violating the UMLS License Agreement. METHODS: The UMLS Category 0 vocabularies can be extracted from the parent UMLS metathesaurus using a Perl script supplied with this article. There are 43 Category 0 vocabularies that can be used freely for research purposes without violating the UMLS License Agreement. Among the Category 0 vocabularies are: MESH (Medical Subject Headings), NCBI (National Center for Bioinformatics) Taxonomy and ICD-9-CM (International Classification of Diseases-9-Clinical Modifiers). RESULTS: The extraction file containing all Category 0 terms and concepts is 72,581,138 bytes in length and contains 1,029,161 terms. The UMLS Metathesaurus MRCON file (January, 2003) is 151,048,493 bytes in length and contains 2,146,899 terms. Therefore the Category 0 vocabularies, in aggregate, are about half the size of the UMLS metathesaurus.A large publicly available listing of 567,921 different medical phrases were automatically coded using the full UMLS metatathesaurus and the Category 0 vocabularies. There were 545,321 phrases with one or more matches against UMLS terms while 468,785 phrases had one or more matches against the Category 0 terms. This indicates that when the two vocabularies are evaluated by their fitness to find at least one term for a medical phrase, the Category 0 vocabularies performed 86% as well as the complete UMLS metathesaurus. CONCLUSION: The Category 0 vocabularies of UMLS constitute a large nomenclature that can be used by biomedical researchers to annotate biomedical data. These annotated data sets can be distributed for research purposes without violating the UMLS License Agreement. These vocabularies may be of particular importance for sharing heterogeneous data from diverse biomedical data sets. The software tools to extract the Category 0 vocabularies are freely available Perl scripts entered into the public domain and distributed with this article.

Asunto(s)

Técnicas de Apoyo para la Decisión , Informática Médica/métodos , Programas Informáticos/tendencias , Unified Medical Language System/tendencias , Algoritmos , Bases de Datos Bibliográficas/tendencias , Bases de Datos Factuales/tendencias , Humanos , Informática Médica/estadística & datos numéricos , Informática Médica/tendencias , Proyectos de Investigación/estadística & datos numéricos , Proyectos de Investigación/tendencias , Programas Informáticos/estadística & datos numéricos , Unified Medical Language System/estadística & datos numéricos

Characteristics of consumer terminology for health information retrieval.

Zeng, Q; Kogan, S; Ash, N; Greenes, R A; Boxwala, A A.

Methods Inf Med ; 41(4): 289-98, 2002.

Artículo en Inglés | MEDLINE | ID: mdl-12425240

RESUMEN

OBJECTIVES: As millions of consumers perform health information retrieval online, the mismatch between their terminology and the terminologies of the information sources could become a major barrier to successful retrievals. To address this problem, we studied the characteristics of consumer terminology for health information retrieval. METHODS: Our study focused on consumer queries that were used on a consumer health service Web site and a consumer health information Web site. We analyzed data from the site-usage logs and conducted interviews with patients. RESULTS: Our findings show that consumers' information retrieval performance is very poor. There are significant mismatches at all levels (lexical, semantic and mental models) between the consumer terminology and both the information source terminology and standard medical vocabularies. CONCLUSIONS: Comprehensive terminology support on all levels is needed for consumer health information retrieval.

Asunto(s)

Comportamiento del Consumidor , Almacenamiento y Recuperación de la Información/normas , Terminología como Asunto , Unified Medical Language System/estadística & datos numéricos , Adulto , Comportamiento del Consumidor/estadística & datos numéricos , Femenino , Humanos , Internet , Masculino , Persona de Mediana Edad

Assessing and enhancing the value of the UMLS Knowledge Sources.

Humphreys, B L; Lindberg, D A; Hole, W T.

Proc Annu Symp Comput Appl Med Care ; : 78-82, 1991.

Artículo en Inglés | MEDLINE | ID: mdl-1807711

RESUMEN

The goal of the UMLS Project is to give practitioners and researchers easy access to machine-readable information from diverse sources. Assessment of the first experimental versions of the UMLS Knowledge Sources is essential to measuring progress toward that goal and to identifying needed enhancements. As of July 30, 1991, copies of the first edition of the UMLS Knowledge Sources had been distributed to 143 individuals and institutions; 66 had provided initial feedback information. The information received indicates that the UMLS Knowledge Sources will undergo broad testing in the patient care, medical education, library service, and product development environments. Preliminary data support the hypothesis that expanded coverage of routine clinical concepts is needed. Key enhancements planned for 1992 and beyond include expanded coverage of ICD-9-CM and CPT.

Asunto(s)

Unified Medical Language System/organización & administración , National Library of Medicine (U.S.) , Evaluación de Programas y Proyectos de Salud , Encuestas y Cuestionarios , Unified Medical Language System/estadística & datos numéricos , Estados Unidos , Interfaz Usuario-Computador

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA