Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 395
Filtrar
1.
Data Brief ; 56: 110781, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39252773

RESUMEN

Automatic narrative text analysis is gaining traction as artificial intelligence-based computational linguistic tools such as named entity recognition systems and natural language processing (NLP) toolkits become more prevalent. Character identification is the first stage in narrative text analysis; however, it is difficult due to the diversity of appearances and distinctive characteristics among regions. Further challenging analyses, such as role classification, emotion and personality profiling, and character network development, require successful character identification initially, which is crucial. Because there are so many annotated English datasets, computational linguistic tools are mostly focused on English literature. However, there are restricted tools for analyzing Balinese story texts because of a scarcity of low-resource language datasets. The study presents the first annotated Balinese story texts dataset for narrative text analyses, consisting of four sub-datasets for character identification, alias clustering (named entity linking, alias resolution), and character classification. The dataset is a compilation of 120 manually annotated Balinese stories from books and public websites, spanning multiple genres such as folk tales, fairy tales, fables, and mythology. Two Balinese native speakers, including an expert in sociolinguistics and macrolinguistics, annotated the dataset using predetermined guidelines set by an expert. The inter-annotator agreement (IAA) score is calculated using Cohen's Kappa Coefficient, Jaccard Similarity Coefficient, Mean F1-score to measure the level of agreement between annotators and dataset consistency and its reliability. The first subdataset consists of 89,917 annotated words with five labels referring to the Balinese-character named entities. Each character entity's appearance in 6,634 sentences is further annotated in the second subdataset. These two sub-datasets can be used for character identification purposes at the word and sentence level. The list of character groups which are groups of various aliases for each character entity has been annotated in the third subdataset for alias clustering purposes. The third subdataset contains 930-character groups from 120 story texts with each story text containing an average of 7-to-8-character groups. In the fourth subdataset, 848-character groups-of the 930-character groups in the third subdataset-have been categorized as protagonists and antagonists. The protagonists (66.16 %) make up most character groups, with the antagonists (33.84 %) making up the rest of the groups. The fourth subdataset can be used for computing-based classification of characters into two roles between protagonist and antagonist. These datasets have the potential to improve research in narrative text analyses, especially in the areas of computational linguistic tools and advanced machine learning (ML) and deep learning (DL) models in low resource languages. It can also be used for further research including character network development, character relationship extraction, and character classification beyond protagonist and antagonist.

2.
ACS Synth Biol ; 13(9): 3051-3055, 2024 Sep 20.
Artículo en Inglés | MEDLINE | ID: mdl-39230953

RESUMEN

The progress and utility of synthetic biology is currently hindered by the lengthy process of studying literature and replicating poorly documented work. Reconstruction of crucial design information through post hoc curation is highly noisy and error-prone. To combat this, author participation during the curation process is crucial. To encourage author participation without overburdening them, an ML-assisted curation tool called SeqImprove has been developed. Using named entity recognition, called entity normalization, and sequence matching, SeqImprove creates machine-accessible sequence data and metadata annotations, which authors can then review and edit before submitting a final sequence file. SeqImprove makes it easier for authors to submit sequence data that is FAIR (findable, accessible, interoperable, and reusable).


Asunto(s)
Aprendizaje Automático , Biología Sintética , Biología Sintética/métodos , Programas Informáticos , Redes Reguladoras de Genes/genética , Curaduría de Datos/métodos
3.
Stud Health Technol Inform ; 317: 228-234, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39234726

RESUMEN

INTRODUCTION: Large Language Models (LLMs) like ChatGPT have become increasingly prevalent. In medicine, many potential areas arise where LLMs may offer added value. Our research focuses on the use of open-source LLM alternatives like Llama 3, Gemma, Mistral, and Mixtral to extract medical parameters from German clinical texts. We concentrate on German due to an observed gap in research for non-English tasks. OBJECTIVE: To evaluate the effectiveness of open-source LLMs in extracting medical parameters from German clinical texts, specially focusing on cardiovascular function indicators from cardiac MRI reports. METHODS: We extracted 14 cardiovascular function indicators, including left and right ventricular ejection fraction (LV-EF and RV-EF), from 497 variously formulated cardiac magnetic resonance imaging (MRI) reports. Our systematic analysis involved assessing the performance of Llama 3, Gemma, Mistral, and Mixtral models in terms of right annotation and named entity recognition (NER) accuracy. RESULTS: The analysis confirms strong performance with up to 95.4% right annotation and 99.8% NER accuracy across different architectures, despite the fact that these models were not explicitly fine-tuned for data extraction and the German language. CONCLUSION: The results strongly recommend using open-source LLMs for extracting medical parameters from clinical texts, including those in German, due to their high accuracy and effectiveness even without specific fine-tuning.


Asunto(s)
Procesamiento de Lenguaje Natural , Alemania , Humanos , Imagen por Resonancia Magnética/métodos , Minería de Datos/métodos
4.
Sensors (Basel) ; 24(17)2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39275536

RESUMEN

Named entity recognition is a critical task in the electronic medical record management system for rehabilitation robots. Handwritten documents often contain spelling errors and illegible handwriting, and healthcare professionals frequently use different terminologies. These issues adversely affect the robot's judgment and precise operations. Additionally, the same entity can have different meanings in various contexts, leading to category inconsistencies, which further increase the system's complexity. To address these challenges, a novel medical entity recognition algorithm for Chinese electronic medical records is developed to enhance the processing and understanding capabilities of rehabilitation robots for patient data. This algorithm is based on a fusion classification strategy. Specifically, a preprocessing strategy is proposed according to clinical medical knowledge, which includes redefining entities, removing outliers, and eliminating invalid characters. Subsequently, a medical entity recognition model is developed to identify Chinese electronic medical records, thereby enhancing the data analysis capabilities of rehabilitation robots. To extract semantic information, the ALBERT network is utilized, and BILSTM and MHA networks are combined to capture the dependency relationships between words, overcoming the problem of different meanings for the same entity in different contexts. The CRF network is employed to determine the boundaries of different entities. The research results indicate that the proposed model significantly enhances the recognition accuracy of electronic medical texts by rehabilitation robots, particularly in accurately identifying entities and handling terminology diversity and contextual differences. This model effectively addresses the key challenges faced by rehabilitation robots in processing Chinese electronic medical texts, and holds important theoretical and practical value.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Robótica , Robótica/métodos , Humanos , China , Rehabilitación/métodos , Semántica , Pueblos del Este de Asia
5.
Front Nutr ; 11: 1429259, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39290564

RESUMEN

Introduction: Recognizing and extracting key information from textual data plays an important role in intelligent systems by maintaining up-to-date knowledge, reinforcing informed decision-making, question-answering, and more. It is especially apparent in the food domain, where critical information guides the decisions of nutritionists and clinicians. The information extraction process involves two natural language processing tasks named entity recognition-NER and named entity linking-NEL. With the emergence of large language models (LLMs), especially ChatGPT, many areas began incorporating its knowledge to reduce workloads or simplify tasks. In the field of food, however, we noticed an opportunity to involve ChatGPT in NER and NEL. Methods: To assess ChatGPT's capabilities, we have evaluated its two versions, ChatGPT-3.5 and ChatGPT-4, focusing on their performance across both NER and NEL tasks, emphasizing food-related data. To benchmark our results in the food domain, we also investigated its capabilities in a more broadly investigated biomedical domain. By evaluating its zero-shot capabilities, we were able to ascertain the strengths and weaknesses of the two versions of ChatGPT. Results: Despite being able to show promising results in NER compared to other models. When tasked with linking entities to their identifiers from semantic models ChatGPT's effectiveness falls drastically. Discussion: While the integration of ChatGPT holds potential across various fields, it is crucial to approach its use with caution, particularly in relying on its responses for critical decisions in food and bio-medicine.

6.
Artif Intell Med ; 156: 102970, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39197375

RESUMEN

Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. This paper proposes a method for zero- and few-shot NER in the biomedical domain to address these challenges. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large number of datasets and biomedical entities, which allows the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.


Asunto(s)
Semántica , Procesamiento de Lenguaje Natural , Humanos , Minería de Datos/métodos , Algoritmos
7.
Heliyon ; 10(12): e32479, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-39183851

RESUMEN

Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like micro-blogging posts and news has proven challenging as they struggle to model open-domain entities and relations, typically found in these sources. In this paper, we propose an enhanced information extraction pipeline tailored to the extraction of a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over word embeddings. We provide a use case on extracting semantic triples from a corpus of 100 thousand tweets about digital transformation and publicly release the generated knowledge graph. On the same dataset, we conduct two experimental evaluations, showing that the system produces triples with precision over 95% and outperforms similar pipelines of around 5% in terms of precision, while generating a comparatively higher number of triples.

8.
medRxiv ; 2024 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-39185518

RESUMEN

The identification and classification of carcinogens is critical in cancer epidemiology, necessitating updated methodologies to manage the burgeoning biomedical literature. Current systems, like those run by the International Agency for Research on Cancer (IARC) and the National Toxicology Program (NTP), face challenges due to manual vetting and disparities in carcinogen classification spurred by the volume of emerging data. To address these issues, we introduced the Carcinogen Detection via Transformers (CarD-T) framework, a text analytics approach that combines transformer-based machine learning with probabilistic statistical analysis to efficiently nominate carcinogens from scientific texts. CarD-T uses Named Entity Recognition (NER) trained on PubMed abstracts featuring known carcinogens from IARC groups and includes a context classifier to enhance accuracy and manage computational demands. Using this method, journal publication data indexed with carcinogenicity & carcinogenesis Medical Subject Headings (MeSH) terms from the last 25 years was analyzed, identifying potential carcinogens. Training CarD-T on 60% of established carcinogens (Group 1 and 2A carcinogens, IARC designation), CarD-T correctly to identifies all of the remaining Group 1 and 2A designated carcinogens from the analyzed text. In addition, CarD-T nominates roughly 1500 more entities as potential carcinogens that have at least two publications citing evidence of carcinogenicity. Comparative assessment of CarD-T against GPT-4 model reveals a high recall (0.857 vs 0.705) and F1 score (0.875 vs 0.792), and comparable precision (0.894 vs 0.903). Additionally, CarD-T highlights 554 entities that show disputing evidence for carcinogenicity. These are further analyzed using Bayesian temporal Probabilistic Carcinogenic Denomination (PCarD) to provide probabilistic evaluations of their carcinogenic status based on evolving evidence. Our findings underscore that the CarD-T framework is not only robust and effective in identifying and nominating potential carcinogens within vast biomedical literature but also efficient on consumer GPUs. This integration of advanced NLP capabilities with vital epidemiological analysis significantly enhances the agility of public health responses to carcinogen identification, thereby setting a new benchmark for automated, scalable toxicological investigations.

9.
BMC Med Inform Decis Mak ; 24(1): 221, 2024 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-39103849

RESUMEN

Performing data augmentation in medical named entity recognition (NER) is crucial due to the unique challenges posed by this field. Medical data is characterized by high acquisition costs, specialized terminology, imbalanced distributions, and limited training resources. These factors make achieving high performance in medical NER particularly difficult. Data augmentation methods help to mitigate these issues by generating additional training samples, thus balancing data distribution, enriching the training dataset, and improving model generalization. This paper proposes two data augmentation methods-Contextual Random Replacement based on Word2Vec Augmentation (CRR) and Targeted Entity Random Replacement Augmentation (TER)-aimed at addressing the scarcity and imbalance of data in the medical domain. When combined with a deep learning-based Chinese NER model, these methods can significantly enhance performance and recognition accuracy under limited resources. Experimental results demonstrate that both augmentation methods effectively improve the recognition capability of medical named entities. Specifically, the BERT-BiLSTM-CRF model achieved the highest F1 score of 83.587%, representing a 1.49% increase over the baseline model. This validates the importance and effectiveness of data augmentation in medical NER.


Asunto(s)
Aprendizaje Profundo , Humanos , Procesamiento de Lenguaje Natural
10.
Stud Health Technol Inform ; 316: 1487-1491, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176485

RESUMEN

This article presents our experience in development an ontological model can be used in clinical decision support systems (CDSS) creating. We have used the largest international biomedical terminological metathesaurus the Unified Medical Language System (UMLS) as the basis of our model. This metathesaurus has been adapted into Russian using an automated hybrid translation system with expert control. The product we have created was named as the National Unified Terminological System (NUTS). We have added more than 33 million scientific and clinical relationships between NUTS terms, extracted from the texts of scientific articles and electronic health records. We have also computed weights for each relationship, standardized their values and created symptom checker in preliminary diagnostics based on this. We expect, that the NUTS allow solving task of named entity recognition (NER) and increasing terms interoperability in different CDSS.


Asunto(s)
Registros Electrónicos de Salud , Bases del Conocimiento , Unified Medical Language System , Sistemas de Apoyo a Decisiones Clínicas , Procesamiento de Lenguaje Natural , Humanos , Federación de Rusia , Vocabulario Controlado
11.
Stud Health Technol Inform ; 316: 272-276, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176725

RESUMEN

The task of Named Entity Recognition (NER) is central for leveraging the content of clinical texts in observational studies. Indeed, texts contain a large part of the information available in Electronic Health Records (EHRs). However, clinical texts are highly heterogeneous between healthcare services and institutions, between countries and languages, making it hard to predict how existing tools may perform on a particular corpus. We compared four NER approaches on three French corpora and share our benchmarking pipeline in an open and easy-to-reuse manner, using the medkit Python library. We include in our pipelines fine-tuning operations with either one or several of the considered corpora. Our results illustrate the expected superiority of language models over a dictionary-based approach, and question the necessity of refining models already trained on biomedical texts. Beyond benchmarking, we believe sharing reusable and customizable pipelines for comparing fast-evolving Natural Language Processing (NLP) tools is a valuable contribution, since clinical texts themselves can hardly be shared for privacy concerns.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Francia , Humanos
12.
Stud Health Technol Inform ; 316: 611-615, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176816

RESUMEN

Secure extraction of Personally Identifiable Information (PII) from Electronic Health Records (EHRs) presents significant privacy and security challenges. This study explores the application of Federated Learning (FL) to overcome these challenges within the context of French EHRs. By utilizing a multilingual BERT model in an FL simulation involving 20 hospitals, each represented by a unique medical department or pole, we compared the performance of two setups: individual models, where each hospital uses only its own training and validation data without engaging in the FL process, and federated models, where multiple hospitals collaborate to train a global FL model. Our findings demonstrate that FL models not only preserve data confidentiality but also outperform the individual models. In fact, the Global FL model achieved an F1 score of 75,7%, slightly comparable to that of the Centralized approach at 78,5%. This research underscores the potential of FL in extracting PIIs from EHRs, encouraging its broader adoption in health data analysis.


Asunto(s)
Seguridad Computacional , Confidencialidad , Registros Electrónicos de Salud , Aprendizaje Automático , Francia , Humanos , Registros de Salud Personal
13.
Stud Health Technol Inform ; 316: 666-670, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176830

RESUMEN

Named Entity Recognition (NER) models based on Transformers have gained prominence for their impressive performance in various languages and domains. This work delves into the often-overlooked aspect of entity-level metrics and exposes significant discrepancies between token and entity-level evaluations. The study utilizes a corpus of synthetic French oncological reports annotated with entities representing oncological morphologies. Four different French BERT-based models are fine-tuned for token classification, and their performance is rigorously assessed at both token and entity-level. In addition to fine-tuning, we evaluate ChatGPT's ability to perform NER through prompt engineering techniques. The findings reveal a notable disparity in model effectiveness when transitioning from token to entity-level metrics, highlighting the importance of comprehensive evaluation methodologies in NER tasks. Furthermore, in comparison to BERT, ChatGPT remains limited when it comes to detecting advanced entities in French.


Asunto(s)
Procesamiento de Lenguaje Natural , Francia , Humanos , Registros Electrónicos de Salud , Lenguaje , Neoplasias , Vocabulario Controlado
14.
Radiol Artif Intell ; 6(5): e230277, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39046325

RESUMEN

Purpose To develop a machine learning approach for classifying disease progression in chest radiographs using weak labels automatically derived from radiology reports. Materials and Methods In this retrospective study, a twin neural network was developed to classify anatomy-specific disease progression into four categories: improved, unchanged, worsened, and new. A two-step weakly supervised learning approach was employed, pretraining the model on 243 008 frontal chest radiographs from 63 877 patients (mean age, 51.7 years ± 17.0 [SD]; 34 813 [55%] female) included in the MIMIC-CXR database and fine-tuning it on the subset with progression labels derived from consecutive studies. Model performance was evaluated for six pathologic observations on test datasets of unseen patients from the MIMIC-CXR database. Area under the receiver operating characteristic (AUC) analysis was used to evaluate classification performance. The algorithm is also capable of generating bounding-box predictions to localize areas of new progression. Recall, precision, and mean average precision were used to evaluate the new progression localization. One-tailed paired t tests were used to assess statistical significance. Results The model outperformed most baselines in progression classification, achieving macro AUC scores of 0.72 ± 0.004 for atelectasis, 0.75 ± 0.007 for consolidation, 0.76 ± 0.017 for edema, 0.81 ± 0.006 for effusion, 0.7 ± 0.032 for pneumonia, and 0.69 ± 0.01 for pneumothorax. For new observation localization, the model achieved mean average precision scores of 0.25 ± 0.03 for atelectasis, 0.34 ± 0.03 for consolidation, 0.33 ± 0.03 for edema, and 0.31 ± 0.03 for pneumothorax. Conclusion Disease progression classification models were developed on a large chest radiograph dataset, which can be used to monitor interval changes and detect new pathologic conditions on chest radiographs. Keywords: Prognosis, Unsupervised Learning, Transfer Learning, Convolutional Neural Network (CNN), Emergency Radiology, Named Entity Recognition Supplemental material is available for this article. © RSNA, 2024 See also commentary by Alves and Venkadesh in this issue.


Asunto(s)
Progresión de la Enfermedad , Radiografía Torácica , Aprendizaje Automático Supervisado , Humanos , Femenino , Persona de Mediana Edad , Estudios Retrospectivos , Masculino , Radiografía Torácica/métodos , Redes Neurales de la Computación , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Adulto
15.
Sci Rep ; 14(1): 17488, 2024 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-39080339

RESUMEN

Named entity recognition (NER) plays a crucial role in the extraction and utilization of knowledge of ancient Chinese books. However, the challenges of ancient Chinese NER not only originate from linguistic features such as the use of single characters and short sentences but are also exacerbated by the scarcity of training data. These factors together limit the capability of deep learning models, like BERT-CRF, in capturing the semantic representation of ancient Chinese characters. In this paper, we explore the semantic enhancement of NER in ancient Chinese books through the utilization of external knowledge. We propose a novel model based on Graph Neural Networks that integrates two different forms of external knowledge: dictionary-level and chapter-level information. Through the Graph Attention Mechanism (GAT), these external knowledge are effectively incorporated into the model's input context. Our model is evaluated on the C_CLUE dataset, showing an improvement of 3.82% over the baseline BAC-CRF model. It also achieves the best score compared to several state-of-the-art dictionary-augmented models.


Asunto(s)
Redes Neurales de la Computación , Semántica , China , Libros/historia , Humanos , Aprendizaje Profundo
16.
JMIR Cancer ; 10: e43070, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39037754

RESUMEN

BACKGROUND: Commonly offered as supportive care, therapist-led online support groups (OSGs) are a cost-effective way to provide support to individuals affected by cancer. One important indicator of a successful OSG session is group cohesion; however, monitoring group cohesion can be challenging due to the lack of nonverbal cues and in-person interactions in text-based OSGs. The Artificial Intelligence-based Co-Facilitator (AICF) was designed to contextually identify therapeutic outcomes from conversations and produce real-time analytics. OBJECTIVE: The aim of this study was to develop a method to train and evaluate AICF's capacity to monitor group cohesion. METHODS: AICF used a text classification approach to extract the mentions of group cohesion within conversations. A sample of data was annotated by human scorers, which was used as the training data to build the classification model. The annotations were further supported by finding contextually similar group cohesion expressions using word embedding models as well. AICF performance was also compared against the natural language processing software Linguistic Inquiry Word Count (LIWC). RESULTS: AICF was trained on 80,000 messages obtained from Cancer Chat Canada. We tested AICF on 34,048 messages. Human experts scored 6797 (20%) of the messages to evaluate the ability of AICF to classify group cohesion. Results showed that machine learning algorithms combined with human input could detect group cohesion, a clinically meaningful indicator of effective OSGs. After retraining with human input, AICF reached an F1-score of 0.82. AICF performed slightly better at identifying group cohesion compared to LIWC. CONCLUSIONS: AICF has the potential to assist therapists by detecting discord in the group amenable to real-time intervention. Overall, AICF presents a unique opportunity to strengthen patient-centered care in web-based settings by attending to individual needs. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/21453.

17.
Can Assoc Radiol J ; : 8465371241266785, 2024 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-39066637

RESUMEN

Purpose: This study evaluates the efficacy of a commercial medical Named Entity Recognition (NER) model combined with a post-processing protocol in identifying incidental pulmonary nodules from CT reports. Methods: We analyzed 9165 anonymized CT reports and classified them into 3 categories: no nodules, nodules present, and nodules >6 mm. For each report, a generic medical NER model annotated entities and their relations, which were then filtered through inclusion/exclusion criteria selected to identify pulmonary nodules. Ground truth was established by manual review. To better understand the relationship between model performance and nodule prevalence, a subset of the data was programmatically balanced to equalize the number of reports in each class category. Results: In the unbalanced subset of the data, the model achieved a sensitivity of 97%, specificity of 99%, and accuracy of 99% in detecting pulmonary nodules mentioned in the reports. For nodules >6 mm, sensitivity was 95%, specificity was 100%, and accuracy was 100%. In the balanced subset of the data, sensitivity was 99%, specificity 96%, and accuracy 97% for nodule detection; for larger nodules, sensitivity was 94%, specificity 99%, and accuracy 98%. Conclusions: The NER model demonstrated high sensitivity and specificity in detecting pulmonary nodules reported in CT scans, including those >6 mm which are potentially clinically significant. The results were consistent across both unbalanced and balanced datasets indicating that the model performance is independent of nodule prevalence. Implementing this technology in hospital systems could automate the identification of at-risk patients, ensuring timely follow-up and potentially reducing missed or late-stage cancer diagnoses.

18.
BMC Med Inform Decis Mak ; 24(1): 192, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38982465

RESUMEN

BACKGROUND: As global aging intensifies, the prevalence of ocular fundus diseases continues to rise. In China, the tense doctor-patient ratio poses numerous challenges for the early diagnosis and treatment of ocular fundus diseases. To reduce the high risk of missed or misdiagnosed cases, avoid irreversible visual impairment for patients, and ensure good visual prognosis for patients with ocular fundus diseases, it is particularly important to enhance the growth and diagnostic capabilities of junior doctors. This study aims to leverage the value of electronic medical record data to developing a diagnostic intelligent decision support platform. This platform aims to assist junior doctors in diagnosing ocular fundus diseases quickly and accurately, expedite their professional growth, and prevent delays in patient treatment. An empirical evaluation will assess the platform's effectiveness in enhancing doctors' diagnostic efficiency and accuracy. METHODS: In this study, eight Chinese Named Entity Recognition (NER) models were compared, and the SoftLexicon-Glove-Word2vec model, achieving a high F1 score of 93.02%, was selected as the optimal recognition tool. This model was then used to extract key information from electronic medical records (EMRs) and generate feature variables based on diagnostic rule templates. Subsequently, an XGBoost algorithm was employed to construct an intelligent decision support platform for diagnosing ocular fundus diseases. The effectiveness of the platform in improving diagnostic efficiency and accuracy was evaluated through a controlled experiment comparing experienced and junior doctors. RESULTS: The use of the diagnostic intelligent decision support platform resulted in significant improvements in both diagnostic efficiency and accuracy for both experienced and junior doctors (P < 0.05). Notably, the gap in diagnostic speed and precision between junior doctors and experienced doctors narrowed considerably when the platform was used. Although the platform also provided some benefits to experienced doctors, the improvement was less pronounced compared to junior doctors. CONCLUSION: The diagnostic intelligent decision support platform established in this study, based on the XGBoost algorithm and NER, effectively enhances the diagnostic efficiency and accuracy of junior doctors in ocular fundus diseases. This has significant implications for optimizing clinical diagnosis and treatment.


Asunto(s)
Oftalmólogos , Humanos , Toma de Decisiones Clínicas , Registros Electrónicos de Salud/normas , Inteligencia Artificial , China , Sistemas de Apoyo a Decisiones Clínicas
19.
Sci Rep ; 14(1): 16106, 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-38997330

RESUMEN

The Span-based model can effectively capture the complex entity structure in the text, thus becoming the mainstream model for nested named entity recognition (Nested NER) tasks. However, traditional Span-based models decode each entity span independently. They do not consider the semantic connections between spans or the entities' positional information, which limits their performance. To address these issues, we propose a Bi-Directional Context-Aware Network (Bi-DCAN) for the Nested NER. Specifically, we first design a new span-level semantic relation model. Then, the Bi-DCAN is implemented to capture this semantic relationship. Furthermore, we incorporate Rotary Position Embedding into the bi-affine mechanism to capture the relative positional information between the head and tail tokens, enabling the model to more accurately determine the position of each entity. Experimental results show that compared to the latest model Diffusion-NER, our model reduces 20M parameters and increases the F1 scores by 0.24 and 0.09 on the ACE2005 and GENIA datasets respectively, which proves that our model has an excellent ability to recognise nested entities.

20.
Heliyon ; 10(12): e32093, 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-38948047

RESUMEN

Chinese agricultural named entity recognition (NER) has been studied with supervised learning for many years. However, considering the scarcity of public datasets in the agricultural domain, exploring this task in the few-shot scenario is more practical for real-world demands. In this paper, we propose a novel model named GlyReShot, integrating the knowledge of Chinese character glyph into few-shot NER models. Although the utilization of glyph has been proven successful in supervised models, two challenges still persist in the few-shot setting, i.e., how to obtain glyph representations and when to integrate them into the few-shot model. GlyReShot handles the two challenges by introducing a lightweight glyph representation obtaining module and a training-free label refinement strategy. Specifically, the glyph representations are generated based on the descriptive sentences by filling the predefined template. As most steps come before training, this module aligns well with the few-shot setting. Furthermore, by computing the confidence values for draft predictions, the refinement strategy selectively utilizes the glyph information only when the confidence values are relatively low, thus mitigating the influence of noise. Finally, we annotate a new agricultural NER dataset and the experimental results demonstrate effectiveness of GlyReShot for few-shot Chinese agricultural NER.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA