Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 119
Filtrar
1.
Ophthalmol Sci ; 4(6): 100578, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39253550

RESUMEN

Purpose: To compare the performance of 3 phenotyping methods in identifying diabetic retinopathy (DR) and related clinical conditions. Design: Three phenotyping methods were used to identify clinical conditions including unspecified DR, nonproliferative DR (NPDR) (mild, moderate, severe), consolidated NPDR (unspecified DR or any NPDR), proliferative DR, diabetic macular edema (DME), vitreous hemorrhage, retinal detachment (RD) (tractional RD or combined tractional and rhegmatogenous RD), and neovascular glaucoma (NVG). The first method used only International Classification of Diseases, 10th Revision (ICD-10) diagnosis codes (ICD-10 Lookup System). The next 2 methods used a Bidirectional Encoder Representations from Transformers with a dense Multilayer Perceptron output layer natural language processing (NLP) framework. The NLP framework was applied either to free-text of provider notes (Text-Only NLP System) or both free-text and ICD-10 diagnosis codes (Text-and-International Classification of Diseases [ICD] NLP System). Subjects: Adults ≥18 years with diabetes mellitus seen at the Wilmer Eye Institute. Methods: We compared the performance of the 3 phenotyping methods in identifying the DR related conditions with gold standard chart review. We also compared the estimated disease prevalence using each method. Main Outcome Measures: Performance of each method was reported as the macro F1 score. The agreement between the methods was calculated using the kappa statistic. Prevalence estimates were also calculated for each method. Results: A total of 91 097 patients and 692 486 office visits were included in the study. Compared with the gold standard, the Text-and-ICD NLP System had the highest F1 score for most clinical conditions (range 0.39-0.64). The agreement between the ICD-10 Lookup System and Text-Only NLP System varied (kappa of 0.21-0.81). The prevalence of DR and related conditions ranged from 1.1% for NVG to 17.9% for DME (using the Text-and-ICD NLP System). Conclusions: The prevalence of DR and related conditions varied significantly depending on the methodology of identifying cases. The best performing phenotyping method was the Text-and-ICD NLP System that used information in both diagnosis codes as well as free-text notes. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

2.
JMIR Med Educ ; 10: e56342, 2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39118469

RESUMEN

Background: Teaching medical students the skills required to acquire, interpret, apply, and communicate clinical information is an integral part of medical education. A crucial aspect of this process involves providing students with feedback regarding the quality of their free-text clinical notes. Objective: The goal of this study was to assess the ability of ChatGPT 3.5, a large language model, to score medical students' free-text history and physical notes. Methods: This is a single-institution, retrospective study. Standardized patients learned a prespecified clinical case and, acting as the patient, interacted with medical students. Each student wrote a free-text history and physical note of their interaction. The students' notes were scored independently by the standardized patients and ChatGPT using a prespecified scoring rubric that consisted of 85 case elements. The measure of accuracy was percent correct. Results: The study population consisted of 168 first-year medical students. There was a total of 14,280 scores. The ChatGPT incorrect scoring rate was 1.0%, and the standardized patient incorrect scoring rate was 7.2%. The ChatGPT error rate was 86%, lower than the standardized patient error rate. The ChatGPT mean incorrect scoring rate of 12 (SD 11) was significantly lower than the standardized patient mean incorrect scoring rate of 85 (SD 74; P=.002). Conclusions: ChatGPT demonstrated a significantly lower error rate compared to standardized patients. This is the first study to assess the ability of a generative pretrained transformer (GPT) program to score medical students' standardized patient-based free-text clinical notes. It is expected that, in the near future, large language models will provide real-time feedback to practicing physicians regarding their free-text notes. GPT artificial intelligence programs represent an important advance in medical education and medical practice.


Asunto(s)
Estudiantes de Medicina , Humanos , Estudios Retrospectivos , Educación de Pregrado en Medicina/métodos , Evaluación Educacional/métodos , Lenguaje , Anamnesis/métodos , Anamnesis/normas , Competencia Clínica/normas , Masculino
3.
JAMIA Open ; 7(3): ooae070, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39156048

RESUMEN

Objective: Adverse drug reactions (ADRs) are a significant healthcare concern. They are often documented as free text in electronic health records (EHRs), making them challenging to use in clinical decision support systems (CDSS). The study aimed to develop a text mining algorithm to identify ADRs in free text of Dutch EHRs. Materials and Methods: In Phase I, our previously developed CDSS algorithm was recoded and improved upon with the same relatively large dataset of 35 000 notes (Step A), using R to identify possible ADRs with Medical Dictionary for Regulatory Activities (MedDRA) terms and the related Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) (Step B). In Phase II, 6 existing text-mining R-scripts were used to detect and present unique ADRs, and positive predictive value (PPV) and sensitivity were observed. Results: In Phase IA, the recoded algorithm performed better than the previously developed CDSS algorithm, resulting in a PPV of 13% and a sensitivity of 93%. For The sensitivity for serious ADRs was 95%. The algorithm identified 58 additional possible ADRs. In Phase IB, the algorithm achieved a PPV of 10%, a sensitivity of 86%, and an F-measure of 0.18. In Phase II, four R-scripts enhanced the sensitivity and PPV of the algorithm, resulting in a PPV of 70%, a sensitivity of 73%, an F-measure of 0.71, and a 63% sensitivity for serious ADRs. Discussion and Conclusion: The recoded Dutch algorithm effectively identifies ADRs from free-text Dutch EHRs using R-scripts and MedDRA/SNOMED-CT. The study details its limitations, highlighting the algorithm's potential and significant improvements.

4.
Brain Behav ; 14(7): e3547, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39054258

RESUMEN

INTRODUCTION: Migraine-related stigma (MiRS) and social burden is increasingly recognized. We assessed perspectives and attitudes toward migraine in people with and without migraine in Japan. METHODS: OVERCOME (Japan) was a cross-sectional, population-based web survey of people with and without migraine (July-September 2020). People with migraine were individuals who met the modified International Classification of Headache Disorders criteria or had self-reported physician-diagnosed migraine. People without migraine were selected per quota sampling to represent the Japanese adult population. People with migraine reported their experiences on stigma and social burden and answered how frequently they experienced stigma using the MiRS questionnaire. Associations between MiRS and disability and MiRS and interictal burden were examined using the migraine disability assessment and Migraine Interictal Burden Scale-4. People without migraine reported their experiences and attitudes toward people with migraine by answering an 11-item attitudinal migraine questionnaire. RESULTS: A total of 17,071 and 2008 people with and without migraine, respectively, completed the survey. Overall, 11,228 (65.8%) respondents with migraine reported that they have never experienced stigma or burden; however, of the 12,383 employed respondents, 5841 (47.2%) reported that their current employers are not "extremely" or "very" understanding about their conditions. Moreover, ∼30%-40% of respondents "sometimes," "often," or "very often" hid their migraine from others. The proportion of respondents who experienced stigma often or very often, as assessed by MiRS, was 16.5%; this increased with the increasing number of monthly migraine headache days. The proportion of respondents with moderate-to-severe disability and interictal burden increased with increasing stigma. Among respondents without migraine, the proportion holding a stigmatizing attitude toward those with migraine was low (<15%); ∼80% had never experienced work- or family-related stigma or burden. CONCLUSION: MiRS and burden exist but may be hidden and underrecognized in Japan. Disease awareness and education may be important to prevent and reduce stigma and burden.


Asunto(s)
Costo de Enfermedad , Trastornos Migrañosos , Estigma Social , Humanos , Trastornos Migrañosos/psicología , Japón , Masculino , Femenino , Adulto , Persona de Mediana Edad , Estudios Transversales , Adulto Joven , Encuestas y Cuestionarios , Anciano , Estereotipo , Conocimientos, Actitudes y Práctica en Salud , Adolescente
5.
Expert Opin Drug Saf ; : 1-9, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-38932699

RESUMEN

BACKGROUND: Fomepizole is a competitive alcohol dehydrogenase inhibitor used for the treatment of ethylene glycol and methanol poisoning. We evaluated the safety and effectiveness of fomepizole in patients with ethylene glycol or methanol poisoning in Japan. RESEARCH DESIGN AND METHODS: This retrospective post-marketing surveillance study conducted in Japan registered patients who received fomepizole intravenous infusion per the package insert (January 2015-June 2022). Endpoints included adverse drug reactions/infections (ADRs), arterial blood pH, and treatment outcomes. RESULTS: Of 147 patients registered (91 institutions), 131 and 126 were included in the safety and effectiveness analysis sets, respectively. Mean age was 43.6 years, and 66.4% were male. Mean time from poison ingestion to treatment was 15.1 hours; 66.4% received concomitant hemodialysis. No serious ADRs were reported. ADRs were reported in seven patients; the most-reported ADR was vomiting (2.3%). Seven patients died, 105 survived without sequelae, and 19 survived with sequelae. Most common sequelae were renal failure or renal dysfunction. Mean arterial blood pH increased to 7.4 by 4 hours of treatment, remaining stable for 24 hours post-treatment. CONCLUSIONS: Fomepizole is well tolerated and helps improve clinical outcomes in patients with ethylene glycol or methanol poisoning in Japan. TRIAL REGISTRATION: Japanese Pharmaceutical Information Center (JapicCTI-152817).

6.
Ophthalmol Sci ; 4(5): 100517, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38881613

RESUMEN

Purpose: Knowing the surgical safety of anterior chamber liquid biopsies will support the increased use of proteomics and other molecular analyses to better understand disease mechanisms and therapeutic responses in patients and clinical trials. Manual review of operative notes from different surgeons and procedures in electronic health records (EHRs) is cumbersome, but free-text software tools could facilitate efficient searches. Design: Retrospective case series. Participants: A total of 1418 aqueous humor liquid biopsies from patients undergoing intraocular surgery. Methods: Free-text EHR searches were performed using the Stanford Research Repository cohort discovery tool to identify complications associated with anterior chamber paracentesis and subsequent endophthalmitis. Complications of the surgery unrelated to the biopsy were not reviewed. Main Outcome Measures: Biopsy-associated intraoperative complications and endophthalmitis. Results: A total of 1418 aqueous humor liquid biopsies were performed by 17 experienced surgeons. EHR free-text searches were 100% error-free for surgical complications, >99% for endophthalmitis (<1% false positive), and >93.6% for anesthesia type, requiring manual review for only a limited number of cases. More than 85% of cases were performed under local anesthesia without ocular muscle akinesia. Although the most common indication was cataract (50.1%), other diagnoses included glaucoma, diabetic retinopathy, uveitis, age-related macular degeneration, endophthalmitis, retinitis pigmentosa, and uveal melanoma. A 50- to 100-µL sample was collected in all cases using either a 30-gauge needle or a blunt cannula via a paracentesis. The median follow-up was >7 months. There was only one minor complication (0.07%) identified: a case of a small tear in Descemet membrane without long-term sequelae. No other complications occurred, including other corneal injuries, lens or iris trauma, hyphema, or suprachoroidal hemorrhage. There was no case of postoperative endophthalmitis. Conclusions: Anterior chamber liquid biopsy during intraocular surgery is a safe procedure and may be considered for large-scale collection of aqueous humor samples for molecular analyses. Free-text EHR searches are an efficient approach to reviewing intraoperative procedures. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

7.
JAMIA Open ; 7(2): ooae044, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38798774

RESUMEN

Objective: Natural language processing (NLP) can enhance research on activities of daily living (ADL) by extracting structured information from unstructured electronic health records (EHRs) notes. This review aims to give insight into the state-of-the-art, usability, and performance of NLP systems to extract information on ADL from EHRs. Materials and Methods: A systematic review was conducted based on searches in Pubmed, Embase, Cinahl, Web of Science, and Scopus. Studies published between 2017 and 2022 were selected based on predefined eligibility criteria. Results: The review identified 22 studies. Most studies (65%) used NLP for classifying unstructured EHR data on 1 or 2 ADL. Deep learning, combined with a ruled-based method or machine learning, was the approach most commonly used. NLP systems varied widely in terms of the pre-processing and algorithms. Common performance evaluation methods were cross-validation and train/test datasets, with F1, precision, and sensitivity as the most frequently reported evaluation metrics. Most studies reported relativity high overall scores on the evaluation metrics. Discussion: NLP systems are valuable for the extraction of unstructured EHR data on ADL. However, comparing the performance of NLP systems is difficult due to the diversity of the studies and challenges related to the dataset, including restricted access to EHR data, inadequate documentation, lack of granularity, and small datasets. Conclusion: This systematic review indicates that NLP is promising for deriving information on ADL from unstructured EHR notes. However, what the best-performing NLP system is, depends on characteristics of the dataset, research question, and type of ADL.

8.
Prev Med Rep ; 43: 102765, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38798907

RESUMEN

Objective: To identify and support correction of misspelled medication names recorded as free text, we compared the relative effectiveness of two user-friendly methods, used without reliance on clinical knowledge. Methods: Leveraging the SAS® COMPGED function, fuzzy string search programs examined 1.8 million medication records from 183,600 World Trade Center General Responder Cohort monitoring visits conducted in New York and New Jersey between 7/16/2002 and 3/31/2021, producing replicable generalized edit distance scores between the reported and correct spelling. Scores < 120 were selected as optimal and compared to Stedman's 2020 Plus Medical/Pharmaceutical Spell Checker first suggested word, used as the comparative standard because it employs both spelling and phonetic similarities to suggest matching words. We coded each methods' results as identifying or not identifying the medications within each visit. Results: Most types of medications (94.4 % anxiety, 98.4 % asthma and 94.6 % ulcer/gastroesophageal reflux disease) were correctly spelled. Cross tabulations assessed the agreement (anxiety 99.9 %, asthma 99.6 % and 98.4 % ulcer/ gastroesophageal reflux disease), false positive (respectively 0.02 %, 0.03 % and 2.0 %) and false negative (respectively 1.9 %, 0.5 % and 1.0 %) values. Scores < 120 occasionally correctly identified medications missed by the spell checker. We observed no difference in medication misspellings across socio-economically and culturally diverse patient characteristics. Conclusions: Both methods efficiently identified most misspelled medications, greatly minimizing the review and rectification needed. The fuzzy method is more universally applicable for condition-specific medications identification, but requires more programming skills. The spell checker is inexpensive, but benefits from modest programming skills and is only available in some languages.

9.
Artif Intell Med ; 151: 102845, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38555848

RESUMEN

BACKGROUND: Electronic health records (EHRs) are a valuable resource for data-driven medical research. However, the presence of protected health information (PHI) makes EHRs unsuitable to be shared for research purposes. De-identification, i.e. the process of removing PHI is a critical step in making EHR data accessible. Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process. OBJECTIVES: Our study aims to provide systematic evidence on how the de-identification of clinical free text written in English has evolved in the last thirteen years, and to report on the performances and limitations of the current state-of-the-art systems for the English language. In addition, we aim to identify challenges and potential research opportunities in this field. METHODS: A systematic search in PubMed, Web of Science, and the DBLP was conducted for studies published between January 2010 and February 2023. Titles and abstracts were examined to identify the relevant studies. Selected studies were then analysed in-depth, and information was collected on de-identification methodologies, data sources, and measured performance. RESULTS: A total of 2125 publications were identified for the title and abstract screening. 69 studies were found to be relevant. Machine learning (37 studies) and hybrid (26 studies) approaches are predominant, while six studies relied only on rules. The majority of the approaches were trained and evaluated on public corpora. The 2014 i2b2/UTHealth corpus is the most frequently used (36 studies), followed by the 2006 i2b2 (18 studies) and 2016 CEGS N-GRID (10 studies) corpora. CONCLUSION: Earlier de-identification approaches aimed at English were mainly rule and machine learning hybrids with extensive feature engineering and post-processing, while more recent performance improvements are due to feature-inferring recurrent neural networks. Current leading performance is achieved using attention-based neural models. Recent studies report state-of-the-art F1-scores (over 98 %) when evaluated in the manner usually adopted by the clinical natural language processing community. However, their performance needs to be more thoroughly assessed with different measures to judge their reliability to safely de-identify data in a real-world setting. Without additional manually labeled training data, state-of-the-art systems fail to generalise well across a wide range of clinical sub-domains.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático
10.
Curr Med Imaging ; 2024 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-38310552

RESUMEN

BACKGROUND: To compare the integrity, clarity, conciseness, etc., of the structured report (SR) versus free-text report (FTR) for computed tomography enterography of Crohn's disease (CD). METHODS: FTRs and SRs were generated for 30 patients with CD. The integrity, clarity, conciseness etc., of SRs versus FTRs, were compared. In this study, an evidence-based medicine practice model was utilized on 92 CD patients based on SR in order to evaluate its clinical value. Then, the life quality of the patients in two groups was evaluated before and after three months of intervention using an Inflammatory Bowel Disease Questionnaire (IBDQ). RESULTS: SRs received higher ratings for satisfaction with integrity (median rating 4.27 vs. 3.75, P=0.008), clarity (median rating 4.20 vs. 3.43, P=0.003), conciseness (median rating 4.23 vs. 3.20, P=0.003), the possibility of contacting a radiologist to interpret (median rating 4.17 vs. 3.20, P<0.001), and overall clinical impact (median rating 4.23 vs. 3.27, P<0.001) than FTRs. Besides, research group had higher score of IBDQ intestinal symptom dimension (median score 61.13 vs. 58.02, P=0.003), IBDQ systemic symptom dimension (median score 24.48 vs. 20.67, P<0.001), IBDQ emotional capacity dimension (median score 65.65 vs. 61.74, P<0.001), IBDQ social ability dimension (median score 26.80 vs. 22.37, P<0.001), and total IBDQ score (median score 178.07 vs. 162.80, P<0.001) than control group. CONCLUSION: The SR of CTE in CD patients was conducive to improving the quality and readability of the report, and CD patients' life quality could significantly improve after the intervention of an evidence-based medicine model based on SR.

11.
J Imaging Inform Med ; 37(1): 3-12, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38343237

RESUMEN

Natural language processing (NLP) can be used to process and structure free text, such as (free text) radiological reports. In radiology, it is important that reports are complete and accurate for clinical staging of, for instance, pulmonary oncology. A computed tomography (CT) or positron emission tomography (PET)-CT scan is of great importance in tumor staging, and NLP may be of additional value to the radiological report when used in the staging process as it may be able to extract the T and N stage of the 8th tumor-node-metastasis (TNM) classification system. The purpose of this study is to evaluate a new TN algorithm (TN-PET-CT) by adding a layer of metabolic activity to an already existing rule-based NLP algorithm (TN-CT). This new TN-PET-CT algorithm is capable of staging chest CT examinations as well as PET-CT scans. The study design made it possible to perform a subgroup analysis to test the external validation of the prior TN-CT algorithm. For information extraction and matching, pyContextNLP, SpaCy, and regular expressions were used. Overall TN accuracy score of the TN-PET-CT algorithm was 0.73 and 0.62 in the training and validation set (N = 63, N = 100). The external validation of the TN-CT classifier (N = 65) was 0.72. Overall, it is possible to adjust the TN-CT algorithm into a TN-PET-CT algorithm. However, outcomes highly depend on the accuracy of the report, the used vocabulary, and its context to express, for example, uncertainty. This is true for both the adjusted PET-CT algorithm and for the CT algorithm when applied in another hospital.

12.
Artículo en Inglés | MEDLINE | ID: mdl-38375418

RESUMEN

INTRODUCTION: Positive birth experiences can be a decisive factor in the well-being and future health of both women and their newborns. The quality of care is a multidimensional concept influenced by the external structure of the organization, the administrative qualities of the environment, and the individual patient's preferences about care. The aim was to describe women's preferences and experiences concerning support and treatment, and their perception of quality of care during all phases of labor and the postnatal period. METHODS: Free-text comments of 635 women from four different open comment questions were analyzed. A qualitative content analysis was conducted in two steps: an inductive phase followed by a deductive phase using the Quality of care from a Patient's Perspective framework (QPP). RESULTS: A total of 1148 free-text comments were coded; and 10 sub-categories were created and inserted under the QPP framework covering the latent meaning of the sub-category. Five of the sub-categories were sorted under the identity-oriented approach, four under physical-technical conditions, and one under the sociocultural atmosphere and reflected the women's experiences and needs regarding support and treatment during early labor, the active phase of labor, and the postnatal period. CONCLUSIONS: High-quality care and support are important aspects for women during childbirth, irrespective of the phase of labor or postnatal period. The need for individualized care, active participation in one's own birth and using a family centered approach were also emphasized. Organizational factors influenced the quality of care and were particularly noticeable during birth.

13.
Age Ageing ; 53(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38364820

RESUMEN

BACKGROUND: Falls involve dynamic risk factors that change over time, but most studies on fall-risk factors are cross-sectional and do not capture this temporal aspect. The longitudinal clinical notes within electronic health records (EHR) provide an opportunity to analyse fall risk factor trajectories through Natural Language Processing techniques, specifically dynamic topic modelling (DTM). This study aims to uncover fall-related topics for new fallers and track their evolving trends leading up to falls. METHODS: This case-cohort study utilised primary care EHR data covering information on older adults between 2016 and 2019. Cases were individuals who fell in 2019 but had no falls in the preceding three years (2016-18). The control group was randomly sampled individuals, with similar size to the cases group, who did not endure falls during the whole study follow-up period. We applied DTM on the clinical notes collected between 2016 and 2018. We compared the trend lines of the case and control groups using the slopes, which indicate direction and steepness of the change over time. RESULTS: A total of 2,384 fallers (cases) and an equal number of controls were included. We identified 25 topics that showed significant differences in trends between the case and control groups. Topics such as medications, renal care, family caregivers, hospital admission/discharge and referral/streamlining diagnostic pathways exhibited a consistent increase in steepness over time within the cases group before the occurrence of falls. CONCLUSIONS: Early recognition of health conditions demanding care is crucial for applying proactive and comprehensive multifactorial assessments that address underlying causes, ultimately reducing falls and fall-related injuries.


Asunto(s)
Médicos Generales , Procesamiento de Lenguaje Natural , Humanos , Anciano , Estudios de Cohortes , Estudios Transversales
14.
J Am Med Inform Assoc ; 31(3): 714-719, 2024 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-38216127

RESUMEN

OBJECTIVES: National attention has focused on increasing clinicians' responsiveness to the social determinants of health, for example, food security. A key step toward designing responsive interventions includes ensuring that information about patients' social circumstances is captured in the electronic health record (EHR). While prior work has assessed levels of EHR "social risk" documentation, the extent to which documentation represents the true prevalence of social risk is unknown. While no gold standard exists to definitively characterize social risks in clinical populations, here we used the best available proxy: social risks reported by patient survey. MATERIALS AND METHODS: We compared survey results to respondents' EHR social risk documentation (clinical free-text notes and International Statistical Classification of Diseases and Related Health Problems [ICD-10] codes). RESULTS: Surveys indicated much higher rates of social risk (8.2%-40.9%) than found in structured (0%-2.0%) or unstructured (0%-0.2%) documentation. DISCUSSION: Ideally, new care standards that include incentives to screen for social risk will increase the use of documentation tools and clinical teams' awareness of and interventions related to social adversity, while balancing potential screening and documentation burden on clinicians and patients. CONCLUSION: EHR documentation of social risk factors currently underestimates their prevalence.


Asunto(s)
Documentación , Registros Electrónicos de Salud , Humanos , Autoinforme , Documentación/métodos , Prevalencia , Factores de Riesgo
15.
JMIR Med Inform ; 12: e49007, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38231569

RESUMEN

BACKGROUND: Physicians are hesitant to forgo the opportunity of entering unstructured clinical notes for structured data entry in electronic health records. Does free text increase informational value in comparison with structured data? OBJECTIVE: This study aims to compare information from unstructured text-based chief complaints harvested and processed by a natural language processing (NLP) algorithm with clinician-entered structured diagnoses in terms of their potential utility for automated improvement of patient workflows. METHODS: Electronic health records of 293,298 patient visits at the emergency department of a Swiss university hospital from January 2014 to October 2021 were analyzed. Using emergency department overcrowding as a case in point, we compared supervised NLP-based keyword dictionaries of symptom clusters from unstructured clinical notes and clinician-entered chief complaints from a structured drop-down menu with the following 2 outcomes: hospitalization and high Emergency Severity Index (ESI) score. RESULTS: Of 12 symptom clusters, the NLP cluster was substantial in predicting hospitalization in 11 (92%) clusters; 8 (67%) clusters remained significant even after controlling for the cluster of clinician-determined chief complaints in the model. All 12 NLP symptom clusters were significant in predicting a low ESI score, of which 9 (75%) remained significant when controlling for clinician-determined chief complaints. The correlation between NLP clusters and chief complaints was low (r=-0.04 to 0.6), indicating complementarity of information. CONCLUSIONS: The NLP-derived features and clinicians' knowledge were complementary in explaining patient outcome heterogeneity. They can provide an efficient approach to patient flow management, for example, in an emergency medicine setting. We further demonstrated the feasibility of creating extensive and precise keyword dictionaries with NLP by medical experts without requiring programming knowledge. Using the dictionary, we could classify short and unstructured clinical texts into diagnostic categories defined by the clinician.

16.
J Med Internet Res ; 26: e48996, 2024 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-38214966

RESUMEN

BACKGROUND: The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources. OBJECTIVE: This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers. METHODS: We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts. RESULTS: Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was κ=0.46, and the prevalence and bias-adjusted κ between our proposed methods and the consensus-based human decisions was κ=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications. CONCLUSIONS: Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research.


Asunto(s)
Inteligencia Artificial , Investigación Biomédica , Revisiones Sistemáticas como Asunto , Humanos , Consenso , Análisis de Datos , Solución de Problemas , Procesamiento de Lenguaje Natural , Flujo de Trabajo
17.
Diagnostics (Basel) ; 14(2)2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38248014

RESUMEN

This study aims to establish advanced sampling methods in free-text data for efficiently building semantic text mining models using deep learning, such as identifying vertebral compression fracture (VCF) in radiology reports. We enrolled a total of 27,401 radiology free-text reports of X-ray examinations of the spine. The predictive effects were compared between text mining models built using supervised long short-term memory networks, independently derived by four sampling methods: vector sum minimization, vector sum maximization, stratified, and simple random sampling, using four fixed percentages. The drawn samples were applied to the training set, and the remaining samples were used to validate each group using different sampling methods and ratios. The predictive accuracy was measured using the area under the receiver operating characteristics (AUROC) to identify VCF. At the sampling ratios of 1/10, 1/20, 1/30, and 1/40, the highest AUROC was revealed in the sampling methods of vector sum minimization as confidence intervals of 0.981 (95%CIs: 0.980-0.983)/0.963 (95%CIs: 0.961-0.965)/0.907 (95%CIs: 0.904-0.911)/0.895 (95%CIs: 0.891-0.899), respectively. The lowest AUROC was demonstrated in the vector sum maximization. This study proposes an advanced sampling method, vector sum minimization, in free-text data that can be efficiently applied to build the text mining models by smartly drawing a small amount of critical representative samples.

18.
Front Digit Health ; 5: 1186208, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38090654

RESUMEN

Introduction: Linking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set. Methods: In this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning-based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual's (n=771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB). Results: We achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n=3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement. Discussion: In conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task.

19.
JMIR Med Inform ; 11: e45377, 2023 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-38131977

RESUMEN

Background: Nursing narratives are an intriguing feature in the prediction of short-term clinical outcomes. However, it is unclear which nursing narratives significantly impact the prediction of postoperative length of stay (LOS) in deep learning models. Objective: Therefore, we applied the Reverse Time Attention (RETAIN) model to predict LOS, entering nursing narratives as the main input. Methods: A total of 354 patients who underwent ovarian cancer surgery at the Seoul National University Bundang Hospital from 2014 to 2020 were retrospectively enrolled. Nursing narratives collected within 3 postoperative days were used to predict prolonged LOS (≥10 days). The physician's assessment was conducted based on a retrospective review of the physician's note within the same period of the data model used. Results: The model performed better than the physician's assessment (area under the receiver operating curve of 0.81 vs 0.58; P=.02). Nursing narratives entered on the first day were the most influential predictors in prolonged LOS. The likelihood of prolonged LOS increased if the physician had to check the patient often and if the patient received intravenous fluids or intravenous patient-controlled analgesia late. Conclusions: The use of the RETAIN model on nursing narratives predicted postoperative LOS effectively for patients who underwent ovarian cancer surgery. These findings suggest that accurate and interpretable deep learning information obtained shortly after surgery may accurately predict prolonged LOS.

20.
JMIR Med Inform ; 11: e49041, 2023 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-37991979

RESUMEN

Background: Radiology reports are usually written in a free-text format, which makes it challenging to reuse the reports. Objective: For secondary use, we developed a 2-stage deep learning system for extracting clinical information and converting it into a structured format. Methods: Our system mainly consists of 2 deep learning modules: entity extraction and relation extraction. For each module, state-of-the-art deep learning models were applied. We trained and evaluated the models using 1040 in-house Japanese computed tomography (CT) reports annotated by medical experts. We also evaluated the performance of the entire pipeline of our system. In addition, the ratio of annotated entities in the reports was measured to validate the coverage of the clinical information with our information model. Results: The microaveraged F1-scores of our best-performing model for entity extraction and relation extraction were 96.1% and 97.4%, respectively. The microaveraged F1-score of the 2-stage system, which is a measure of the performance of the entire pipeline of our system, was 91.9%. Our system showed encouraging results for the conversion of free-text radiology reports into a structured format. The coverage of clinical information in the reports was 96.2% (6595/6853). Conclusions: Our 2-stage deep system can extract clinical information from chest and abdomen CT reports accurately and comprehensively.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA