Búsqueda | Portal Regional de la BVS

1.

Global Public Health Surveillance Using Media Reports: Redesigning GPHIN.

Carter, Dave; Stojanovic, Marta; Hachey, Philip; Fournier, Kevin; Rodier, Simon; Wang, Yunli; De Bruijn, Berry.

Stud Health Technol Inform ; 270: 843-847, 2020 Jun 16.

Artículo en Inglés | MEDLINE | ID: mdl-32570501

RESUMEN

Global public health surveillance relies on reporting structures and transmission of trustworthy health reports. But in practice, these processes may not always be fast enough, or are hindered by procedural, technical, or political barriers. GPHIN, the Global Public Health Intelligence Network, was designed in the late 1990s to scour mainstream news for health events, as that travels faster and more freely. This paper outlines the next generation of GPHIN, which went live in 2017, and reports on design decisions underpinning its new functions and innovations.

Asunto(s)

Medios de Comunicación de Masas , Vigilancia en Salud Pública , Salud Global , Salud Pública

2.

A randomized trial provided new evidence on the accuracy and efficiency of traditional vs. electronically annotated abstraction approaches in systematic reviews.

Li, Tianjing; Saldanha, Ian J; Jap, Jens; Smith, Bryant T; Canner, Joseph; Hutfless, Susan M; Branch, Vernal; Carini, Simona; Chan, Wiley; de Bruijn, Berry; Wallace, Byron C; Walsh, Sandra A; Whamond, Elizabeth J; Murad, M Hassan; Sim, Ida; Berlin, Jesse A; Lau, Joseph; Dickersin, Kay; Schmid, Christopher H.

J Clin Epidemiol ; 115: 77-89, 2019 11.

Artículo en Inglés | MEDLINE | ID: mdl-31302205

RESUMEN

OBJECTIVES: Data Abstraction Assistant (DAA) is a software for linking items abstracted into a data collection form for a systematic review to their locations in a study report. We conducted a randomized cross-over trial that compared DAA-facilitated single-data abstraction plus verification ("DAA verification"), single data abstraction plus verification ("regular verification"), and independent dual data abstraction plus adjudication ("independent abstraction"). STUDY DESIGN AND SETTING: This study is an online randomized cross-over trial with 26 pairs of data abstractors. Each pair abstracted data from six articles, two per approach. Outcomes were the proportion of errors and time taken. RESULTS: Overall proportion of errors was 17% for DAA verification, 16% for regular verification, and 15% for independent abstraction. DAA verification was associated with higher odds of errors when compared with regular verification (adjusted odds ratio [OR] = 1.08; 95% confidence interval [CI]: 0.99-1.17) or independent abstraction (adjusted OR = 1.12; 95% CI: 1.03-1.22). For each article, DAA verification took 20 minutes (95% CI: 1-40) longer than regular verification, but 46 minutes (95% CI: 26 to 66) shorter than independent abstraction. CONCLUSION: Independent abstraction may only be necessary for complex data items. DAA provides an audit trail that is crucial for reproducible research.

Asunto(s)

Indización y Redacción de Resúmenes/métodos , Revisiones Sistemáticas como Asunto , Estudios Cruzados , Recolección de Datos , Humanos , Oportunidad Relativa , Distribución Aleatoria , Programas Informáticos , Adulto Joven

3.

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.

Sarker, Abeed; Belousov, Maksim; Friedrichs, Jasper; Hakala, Kai; Kiritchenko, Svetlana; Mehryary, Farrokh; Han, Sifei; Tran, Tung; Rios, Anthony; Kavuluru, Ramakanth; de Bruijn, Berry; Ginter, Filip; Mahata, Debanjan; Mohammad, Saif M; Nenadic, Goran; Gonzalez-Hernandez, Graciela.

J Am Med Inform Assoc ; 25(10): 1274-1283, 2018 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-30272184

RESUMEN

Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/clasificación , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Medios de Comunicación Sociales/clasificación , Máquina de Vectores de Soporte , Minería de Datos/métodos , Humanos , Farmacovigilancia

4.

Evaluating Data Abstraction Assistant, a novel software application for data abstraction during systematic reviews: protocol for a randomized controlled trial.

Saldanha, Ian J; Schmid, Christopher H; Lau, Joseph; Dickersin, Kay; Berlin, Jesse A; Jap, Jens; Smith, Bryant T; Carini, Simona; Chan, Wiley; De Bruijn, Berry; Wallace, Byron C; Hutfless, Susan M; Sim, Ida; Murad, M Hassan; Walsh, Sandra A; Whamond, Elizabeth J; Li, Tianjing.

Syst Rev ; 5(1): 196, 2016 11 22.

Artículo en Inglés | MEDLINE | ID: mdl-27876082

RESUMEN

BACKGROUND: Data abstraction, a critical systematic review step, is time-consuming and prone to errors. Current standards for approaches to data abstraction rest on a weak evidence base. We developed the Data Abstraction Assistant (DAA), a novel software application designed to facilitate the abstraction process by allowing users to (1) view study article PDFs juxtaposed to electronic data abstraction forms linked to a data abstraction system, (2) highlight (or "pin") the location of the text in the PDF, and (3) copy relevant text from the PDF into the form. We describe the design of a randomized controlled trial (RCT) that compares the relative effectiveness of (A) DAA-facilitated single abstraction plus verification by a second person, (B) traditional (non-DAA-facilitated) single abstraction plus verification by a second person, and (C) traditional independent dual abstraction plus adjudication to ascertain the accuracy and efficiency of abstraction. METHODS: This is an online, randomized, three-arm, crossover trial. We will enroll 24 pairs of abstractors (i.e., sample size is 48 participants), each pair comprising one less and one more experienced abstractor. Pairs will be randomized to abstract data from six articles, two under each of the three approaches. Abstractors will complete pre-tested data abstraction forms using the Systematic Review Data Repository (SRDR), an online data abstraction system. The primary outcomes are (1) proportion of data items abstracted that constitute an error (compared with an answer key) and (2) total time taken to complete abstraction (by two abstractors in the pair, including verification and/or adjudication). DISCUSSION: The DAA trial uses a practical design to test a novel software application as a tool to help improve the accuracy and efficiency of the data abstraction process during systematic reviews. Findings from the DAA trial will provide much-needed evidence to strengthen current recommendations for data abstraction approaches. TRIAL REGISTRATION: The trial is registered at National Information Center on Health Services Research and Health Care Technology (NICHSR) under Registration # HSRP20152269: https://wwwcf.nlm.nih.gov/hsr_project/view_hsrproj_record.cfm?NLMUNIQUE_ID=20152269&SEARCH_FOR=Tianjing%20Li . All items from the World Health Organization Trial Registration Data Set are covered at various locations in this protocol. Protocol version and date: This is version 2.0 of the protocol, dated September 6, 2016. As needed, we will communicate any protocol amendments to the Institutional Review Boards (IRBs) of Johns Hopkins Bloomberg School of Public Health (JHBSPH) and Brown University. We also will make appropriate as-needed modifications to the NICHSR website in a timely fashion.

Asunto(s)

Indización y Redacción de Resúmenes , Programas Informáticos , Revisiones Sistemáticas como Asunto , Medicina Basada en la Evidencia/métodos , Humanos

5.

Complementary approaches to searching MEDLINE may be sufficient for updating systematic reviews.

Sampson, Margaret; de Bruijn, Berry; Urquhart, Christine; Shojania, Kaveh.

J Clin Epidemiol ; 78: 108-115, 2016 10.

Artículo en Inglés | MEDLINE | ID: mdl-26976054

RESUMEN

OBJECTIVES: To maximize the proportion of relevant studies identified for inclusion in systematic reviews (recall), complex time-consuming Boolean searches across multiple databases are common. Although MEDLINE provides excellent coverage of health science evidence, it has proved challenging to achieve high levels of recall through Boolean searches alone. STUDY DESIGN AND SETTING: Recall of one Boolean search method, the clinical query (CQ), combined with a ranking method, support vector machine (SVM), or PubMed-related articles, was tested against a gold standard of studies added to 6 updated Cochrane reviews and 10 Agency for Healthcare Research and Quality (AHRQ) evidence reviews. For the AHRQ sample, precision and temporal stability were examined for each method. RESULTS: Recall of new studies was 0.69 for the CQ, 0.66 for related articles, 0.50 for SVM, 0.91 for the combination of CQ and related articles, and 0.89 for the combination of CQ and SVM. Precision was 0.11 for CQ and related articles combined, and 0.11 for CQ and SVM combined. Related articles showed least stability over time. CONCLUSIONS: The complementary combination of a Boolean search strategy and a ranking strategy appears to provide a robust method for identifying relevant studies in MEDLINE.

Asunto(s)

Almacenamiento y Recuperación de la Información/métodos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , MEDLINE/estadística & datos numéricos , Literatura de Revisión como Asunto , Algoritmos , Humanos , Máquina de Vectores de Soporte , Estados Unidos , United States Agency for Healthcare Research and Quality

6.

A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge.

Cherry, Colin; Zhu, Xiaodan; Martin, Joel; de Bruijn, Berry.

J Am Med Inform Assoc ; 20(5): 843-8, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23523875

RESUMEN

OBJECTIVE: An analysis of the timing of events is critical for a deeper understanding of the course of events within a patient record. The 2012 i2b2 NLP challenge focused on the extraction of temporal relationships between concepts within textual hospital discharge summaries. MATERIALS AND METHODS: The team from the National Research Council Canada (NRC) submitted three system runs to the second track of the challenge: typifying the time-relationship between pre-annotated entities. The NRC system was designed around four specialist modules containing statistical machine learning classifiers. Each specialist targeted distinct sets of relationships: local relationships, 'sectime'-type relationships, non-local overlap-type relationships, and non-local causal relationships. RESULTS: The best NRC submission achieved a precision of 0.7499, a recall of 0.6431, and an F1 score of 0.6924, resulting in a statistical tie for first place. Post hoc improvements led to a precision of 0.7537, a recall of 0.6455, and an F1 score of 0.6954, giving the highest scores reported on this task to date. DISCUSSION AND CONCLUSIONS: Methods for general relation extraction extended well to temporal relations, and gave top-ranked state-of-the-art results. Careful ordering of predictions within result sets proved critical to this success.

Asunto(s)

Inteligencia Artificial , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Resumen del Alta del Paciente , Humanos , Tiempo , Investigación Biomédica Traslacional

7.

Detecting concept relations in clinical text: insights from a state-of-the-art model.

Zhu, Xiaodan; Cherry, Colin; Kiritchenko, Svetlana; Martin, Joel; de Bruijn, Berry.

J Biomed Inform ; 46(2): 275-85, 2013 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-23380683

RESUMEN

This paper addresses an information-extraction problem that aims to identify semantic relations among medical concepts (problems, tests, and treatments) in clinical text. The objectives of the paper are twofold. First, we extend an earlier one-page description (appearing as a part of [5]) of a top-ranked model in the 2010 I2B2 NLP Challenge to a necessary level of details, with the belief that feature design is the most crucial factor to the success of our system and hence deserves a more detailed discussion. We present a precise quantification of the contributions of a wide variety of knowledge sources. In addition, we show the end-to-end results obtained on the noisy output of a top-ranked concept detector, which could help construct a more complete view of the state of the art in the real-world scenario. As the second major objective, we reformulate our models into a composite-kernel framework and present the best result, according to our knowledge, on the same dataset.

Asunto(s)

Minería de Datos/métodos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Semántica , Algoritmos , Inteligencia Artificial , Bases de Datos Factuales , Humanos

8.

Binary classifiers and latent sequence models for emotion detection in suicide notes.

Cherry, Colin; Mohammad, Saif M; de Bruijn, Berry.

Biomed Inform Insights ; 5(Suppl. 1): 147-54, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22879771

RESUMEN

This paper describes the National Research Council of Canada's submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.

9.

Revisiting the area under the ROC.

De Bruijn, Berry.

Stud Health Technol Inform ; 169: 532-6, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-21893806

RESUMEN

The Receiver-Operating Characteristic curve or ROC has been a long standing and well appreciated tool to assess performance of classifiers or diagnostic tests. Likewise, the Area Under the ROC (AUC) has been a metric to summarize the power of a test or ability of a classifier in one measurement. This article aims to revisit the AUC, and ties it to key characteristics of the noncentral hypergeometric distribution. It is demonstrated that this statistical distribution can be used in modeling the behaviour of classifiers, which is of value for comparing classifiers.

Asunto(s)

Área Bajo la Curva , Interpretación Estadística de Datos , Reconocimiento de Normas Patrones Automatizadas/métodos , Curva ROC , Estadística como Asunto/métodos , Algoritmos , Fracturas Óseas/diagnóstico , Fracturas Óseas/diagnóstico por imagen , Humanos , Modelos Estadísticos , Interpretación de Imagen Radiográfica Asistida por Computador , Radiología/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

10.

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.

de Bruijn, Berry; Cherry, Colin; Kiritchenko, Svetlana; Martin, Joel; Zhu, Xiaodan.

J Am Med Inform Assoc ; 18(5): 557-62, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-21565856

RESUMEN

OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. DESIGN: The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. MEASUREMENTS: Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. RESULTS: The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). CONCLUSION: For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.

Asunto(s)

Benchmarking , Minería de Datos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Algoritmos , Inteligencia Artificial , Canadá , Minería de Datos/clasificación , Registros Electrónicos de Salud/clasificación , Humanos

11.

ExaCT: automatic extraction of clinical trial characteristics from journal publications.

Kiritchenko, Svetlana; de Bruijn, Berry; Carini, Simona; Martin, Joel; Sim, Ida.

BMC Med Inform Decis Mak ; 10: 56, 2010 Sep 28.

Artículo en Inglés | MEDLINE | ID: mdl-20920176

RESUMEN

BACKGROUND: Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). METHODS: ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. RESULTS: We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. CONCLUSIONS: Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols).

Asunto(s)

Almacenamiento y Recuperación de la Información/métodos , Publicaciones Periódicas como Asunto , Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Almacenamiento y Recuperación de la Información/normas , Reproducibilidad de los Resultados

12.

Automated information extraction of key trial design elements from clinical trial publications.

de Bruijn, Berry; Carini, Simona; Kiritchenko, Svetlana; Martin, Joel; Sim, Ida.

AMIA Annu Symp Proc ; : 141-5, 2008 Nov 06.

Artículo en Inglés | MEDLINE | ID: mdl-18999067

RESUMEN

Clinical trials are one of the most valuable sources of scientific evidence for improving the practice of medicine. The Trial Bank project aims to improve structured access to trial findings by including formalized trial information into a knowledge base. Manually extracting trial information from published articles is costly, but automated information extraction techniques can assist. The current study highlights a single architecture to extract a wide array of information elements from full-text publications of randomized clinical trials (RCTs). This architecture combines a text classifier with a weak regular expression matcher. We tested this two-stage architecture on 88 RCT reports from 5 leading medical journals, extracting 23 elements of key trial information such as eligibility rules, sample size, intervention, and outcome names. Results prove this to be a promising avenue to help critical appraisers, systematic reviewers, and curators quickly identify key information elements in published RCT articles.

Asunto(s)

Inteligencia Artificial , Medicina Basada en la Evidencia/métodos , Almacenamiento y Recuperación de la Información/métodos , Manuscritos Médicos como Asunto , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto , Proyectos de Investigación , Indización y Redacción de Resúmenes/métodos , Internacionalidad

13.

LitMiner: integration of library services within a bio-informatics application.

Demaine, Jeffrey; Martin, Joel; Wei, Lynn; de Bruijn, Berry.

Biomed Digit Libr ; 3: 11, 2006 Oct 19.

Artículo en Inglés | MEDLINE | ID: mdl-17052341

RESUMEN

BACKGROUND: This paper examines how the adoption of a subject-specific library service has changed the way in which its users interact with a digital library. The LitMiner text-analysis application was developed to enable biologists to explore gene relationships in the published literature. The application features a suite of interfaces that enable users to search PubMed as well as local databases, to view document abstracts, to filter terms, to select gene name aliases, and to visualize the co-occurrences of genes in the literature. At each of these stages, LitMiner offers the functionality of a digital library. Documents that are accessible online are identified by an icon. Users can also order documents from their institution's library collection from within the application. In so doing, LitMiner aims to integrate digital library services into the research process of its users. METHODS: Case study RESULTS: This integration of digital library services into the research process of biologists results in increased access to the published literature. CONCLUSION: In order to make better use of their collections, digital libraries should customize their services to suit the research needs of their patrons.

14.

Identifying wrist fracture patients with high accuracy by automatic categorization of X-ray reports.

de Bruijn, Berry; Cranney, Ann; O'Donnell, Siobhan; Martin, Joel D; Forster, Alan J.

J Am Med Inform Assoc ; 13(6): 696-8, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-16929046

RESUMEN

The authors performed this study to determine the accuracy of several text classification methods to categorize wrist x-ray reports. We randomly sampled 751 textual wrist x-ray reports. Two expert reviewers rated the presence (n = 301) or absence (n = 450) of an acute fracture of wrist. We developed two information retrieval (IR) text classification methods and a machine learning method using a support vector machine (TC-1). In cross-validation on the derivation set (n = 493), TC-1 outperformed the two IR based methods and six benchmark classifiers, including Naive Bayes and a Neural Network. In the validation set (n = 258), TC-1 demonstrated consistent performance with 93.8% accuracy; 95.5% sensitivity; 92.9% specificity; and 87.5% positive predictive value. TC-1 was easy to implement and superior in performance to the other classification methods.

Asunto(s)

Inteligencia Artificial , Traumatismos de la Muñeca/diagnóstico por imagen , Teorema de Bayes , Humanos , Almacenamiento y Recuperación de la Información/clasificación , Registros Médicos/clasificación , Redes Neurales de la Computación , Radiografía , Sistemas de Información Radiológica

15.

PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Donaldson, Ian; Martin, Joel; de Bruijn, Berry; Wolting, Cheryl; Lay, Vicki; Tuekam, Brigitte; Zhang, Shudong; Baskin, Berivan; Bader, Gary D; Michalickova, Katerina; Pawson, Tony; Hogue, Christopher W V.

BMC Bioinformatics ; 4: 11, 2003 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-12689350

RESUMEN

BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

Asunto(s)

Inteligencia Artificial , Almacenamiento y Recuperación de la Información/tendencias , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Bases de Datos Factuales/tendencias , Bases de Datos de Proteínas/tendencias , Genoma Fúngico , Mapeo de Interacción de Proteínas/clasificación , Mapeo de Interacción de Proteínas/estadística & datos numéricos , PubMed/clasificación , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/química

16.

Getting to the (c)ore of knowledge: mining biomedical literature.

de Bruijn, Berry; Martin, Joel.

Int J Med Inform ; 67(1-3): 7-18, 2002 Dec 04.

Artículo en Inglés | MEDLINE | ID: mdl-12460628

RESUMEN

Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or full-text articles. The present article describes the range of text mining techniques that have been applied to scientific documents. It divides 'automated reading' into four general subtasks: text categorization, named entity tagging, fact extraction, and collection-wide analysis. Literature mining offers powerful methods to support knowledge discovery and the construction of topic maps and ontologies. An overview is given of recent developments in medical language processing. Special attention is given to the domain particularities of molecular biology, and the emerging synergy between literature mining and molecular databases accessible through Internet.

Asunto(s)

Indización y Redacción de Resúmenes , Inteligencia Artificial , MEDLINE , Biología Molecular , Procesamiento de Lenguaje Natural , Bases de Datos como Asunto , Humanos , Internet , Semántica

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA