Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 451
Filtrar
1.
Cureus ; 16(8): e67306, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39301343

RESUMEN

INTRODUCTION: This study evaluates the diagnostic performance of the latest large language models (LLMs), GPT-4o (OpenAI, San Francisco, CA, USA) and Claude 3 Opus (Anthropic, San Francisco, CA, USA), in determining causes of death from medical histories and postmortem CT findings. METHODS: We included 100 adult cases whose postmortem CT scans were diagnosable for the causes of death using the gold standard of autopsy results. Their medical histories and postmortem CT findings were compiled, and clinical and imaging diagnoses of both the underlying and immediate causes of death, as well as their personal information, were carefully separated from the database to be shown to the LLMs. Both GPT-4o and Claude 3 Opus generated the top three differential diagnoses for each of the underlying or immediate causes of death based on the following three prompts: 1) medical history only; 2) postmortem CT findings only; and 3) both medical history and postmortem CT findings. The diagnostic performance of the LLMs was compared using McNemar's test. RESULTS: For the underlying cause of death, GPT-4o achieved primary diagnostic accuracy rates of 78%, 72%, and 78%, while Claude 3 Opus achieved 72%, 56%, and 75% for prompts 1, 2, and 3, respectively. Including any of the top three differential diagnoses, GPT-4o's accuracy rates were 92%, 90%, and 92%, while Claude 3 Opus's rates were 93%, 69%, and 93% for prompts 1, 2, and 3, respectively. For the immediate cause of death, GPT-4o's primary diagnostic accuracy rates were 55%, 58%, and 62%, while Claude 3 Opus's rates were 60%, 62%, and 63% for prompts 1,2, and 3, respectively. For any of the top three differential diagnoses, GPT-4o's accuracy rates were 88% for prompt 1 and 91% for prompts 2 and 3, whereas Claude 3 Opus's rates were 92% for all three prompts. Significant differences between the models were observed for prompt two in diagnosing the underlying cause of death (p = 0.03 and <0.01 for the primary and top three differential diagnoses, respectively). CONCLUSION: Both GPT-4o and Claude 3 Opus demonstrated relatively high performance in diagnosing both the underlying and immediate causes of death using medical histories and postmortem CT findings.

2.
Comput Struct Biotechnol J ; 23: 3254-3257, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39286528

RESUMEN

Introduction: OpenAI's ChatGPT, a Large Language Model (LLM), is a powerful tool across domains, designed for text and code generation, fostering collaboration, especially in public health. Investigating the role of this advanced LLM chatbot in assisting public health practitioners in shaping disease transmission models to inform infection control strategies, marks a new era in infectious disease epidemiology research. This study used a case study to illustrate how ChatGPT collaborates with a public health practitioner in co-designing a mathematical transmission model. Methods: Using natural conversation, the practitioner initiated a dialogue involving an iterative process of code generation, refinement, and debugging with ChatGPT to develop a model to fit 10 days of prevalence data to estimate two key epidemiological parameters: i) basic reproductive number (Ro) and ii) final epidemic size. Verification and validation processes are conducted to ensure the accuracy and functionality of the final model. Results: ChatGPT developed a validated transmission model which replicated the epidemic curve and gave estimates of Ro of 4.19 (95 % CI: 4.13- 4.26) and a final epidemic size of 98.3 % of the population within 60 days. It highlighted the advantages of using maximum likelihood estimation with Poisson distribution over least squares method. Conclusion: Integration of LLM in medical research accelerates model development, reducing technical barriers for health practitioners, democratizing access to advanced modeling and potentially enhancing pandemic preparedness globally, particularly in resource-constrained populations.

3.
Heliyon ; 10(16): e35941, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39253130

RESUMEN

This paper presents a novel approach for a low-cost simulator-based driving assessment system incorporating a speech-based assistant, using pre-generated messages from Generative AI to achieve real-time interaction during the assessment. Simulator-based assessment is a crucial apparatus in the research toolkit for various fields. Traditional assessment approaches, like on-road evaluation, though reliable, can be risky, costly, and inaccessible. Simulator-based assessment using stationary driving simulators offers a safer evaluation and can be tailored to specific needs. However, these simulators are often only available to research-focused institutions due to their cost. To address this issue, our study proposes a system with the aforementioned properties aiming to enhance drivers' situational awareness, and foster positive emotional states, i.e., high valence and medium arousal, while assessing participants to prevent subpar performers from proceeding to the next stages of assessment and/or rehabilitation. In addition, this study introduces the speech-based assistant which provides timely guidance adaptable to the ever-changing context of the driving environment and vehicle state. The study's preliminary outcomes reveal encouraging progress, highlighting improved driving performance and positive emotional states when participants are engaged with the assistant during the assessment.

4.
J Am Med Inform Assoc ; 31(10): 2284-2293, 2024 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-39271171

RESUMEN

OBJECTIVES: The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels. MATERIALS AND METHODS: Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents. RESULTS: Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative. DISCUSSION AND CONCLUSION: While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.


Asunto(s)
Codificación Clínica , Clasificación Internacional de Enfermedades , Resumen del Alta del Paciente , Humanos , Registros Electrónicos de Salud , Alta del Paciente , Redes Neurales de la Computación
5.
mSystems ; : e0104424, 2024 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-39291976

RESUMEN

Class II microcins are antimicrobial peptides that have shown some potential as novel antibiotics. However, to date, only 10 class II microcins have been described, and the discovery of novel microcins has been hampered by their short length and high sequence divergence. Here, we ask if we can use numerical embeddings generated by protein large language models to detect microcins in bacterial genome assemblies and whether this method can outperform sequence-based methods such as BLAST. We find that embeddings detect known class II microcins much more reliably than does BLAST and that any two microcins tend to have a small distance in embedding space even though they typically are highly diverged at the sequence level. In data sets of Escherichia coli, Klebsiella spp., and Enterobacter spp. genomes, we further find novel putative microcins that were previously missed by sequence-based search methods. IMPORTANCE: Antibiotic resistance is becoming an increasingly serious problem in modern medicine, but the development pipeline for conventional antibiotics is not promising. Therefore, alternative approaches to combat bacterial infections are urgently needed. One such approach may be to employ naturally occurring antibacterial peptides produced by bacteria to kill competing bacteria. A promising class of such peptides are class II microcins. However, only a small number of class II microcins have been discovered to date, and the discovery of further such microcins has been hampered by their high sequence divergence and short length, which can cause sequence-based search methods to fail. Here, we demonstrate that a more robust method for microcin discovery can be built on the basis of a protein large language model, and we use this method to identify several putative novel class II microcins.

6.
Acad Radiol ; 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39294055

RESUMEN

RATIONALE AND OBJECTIVES: This study aims to evaluate the performance of generative pre-trained transformer (GPT)-4o in the complete official European Board of Radiology (EBR) exam, designed to assess radiology knowledge, skills, and competence. MATERIALS AND METHODS: Questions based on text, image, or video and in the format of multiple choice, free-text reporting, or image annotation were uploaded into GPT-4o using standardized prompting. The results were compared to the average scores of radiologists taking the exam in real time. RESULTS: In Part 1 (multiple response questions and short cases), GPT-4o outperformed both the radiologists' average scores and the maximum pass score (70.2% vs. 58.4% and 60%, respectively). In Part 2 (clinically oriented reasoning evaluation), the performance of GPT-4o was below both the radiologists' average scores and the minimum pass score (52.9% vs. 66.1% and 55%, respectively). The accuracy on questions involving ultrasound images was higher compared to other imaging modalities (accuracy rate, 87.5-100%). For video-based questions, the performance was 50.6%. The model achieved the highest accuracy on most likely diagnosis questions but showed lower accuracy in free-text reporting and direct anatomical assessment in images (100% vs. 31% and 28.6%, respectively). CONCLUSION: The abilities of GPT-4o in the official EBR exam are particularly noteworthy. This study demonstrates the potential of large language models to assist radiologists in assessing and managing cases from diagnosis to treatment or follow-up recommendations, even with zero-shot prompting.

7.
Artículo en Inglés | MEDLINE | ID: mdl-39234774

RESUMEN

Artificial Intelligence (AI) has evolved significantly over the past decades, from its early concepts in the 1950s to the present era of deep learning and natural language processing. Advanced large language models (LLMs), such as Chatbot Generative Pre-Trained Transformer (ChatGPT) is trained to generate human-like text responses. This technology has the potential to revolutionize various aspects of gastroenterology, including diagnosis, treatment, education, and decision-making support. The benefits of using LLMs in gastroenterology could include accelerating diagnosis and treatment, providing personalized care, enhancing education and training, assisting in decision-making, and improving communication with patients. However, drawbacks and challenges such as limited AI capability, training on possibly biased data, data errors, security and privacy concerns, and implementation costs must be addressed to ensure the responsible and effective use of this technology. The future of LLMs in gastroenterology relies on the ability to process and analyse large amounts of data, identify patterns, and summarize information and thus assist physicians in creating personalized treatment plans. As AI advances, LLMs will become more accurate and efficient, allowing for faster diagnosis and treatment of gastroenterological conditions. Ensuring effective collaboration between AI developers, healthcare professionals, and regulatory bodies is essential for the responsible and effective use of this technology. By finding the right balance between AI and human expertise and addressing the limitations and risks associated with its use, LLMs can play an increasingly significant role in gastroenterology, contributing to better patient care and supporting doctors in their work.

8.
Epilepsy Res ; 207: 107451, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-39276641

RESUMEN

OBJECTIVES: Monitoring seizure control metrics is key to clinical care of patients with epilepsy. Manually abstracting these metrics from unstructured text in electronic health records (EHR) is laborious. We aimed to abstract the date of last seizure and seizure frequency from clinical notes of patients with epilepsy using natural language processing (NLP). METHODS: We extracted seizure control metrics from notes of patients seen in epilepsy clinics from two hospitals in Boston. Extraction was performed with the pretrained model RoBERTa_for_seizureFrequency_QA, for both date of last seizure and seizure frequency, combined with regular expressions. We designed the algorithm to categorize the timing of last seizure ("today", "1-6 days ago", "1-4 weeks ago", "more than 1-3 months ago", "more than 3-6 months ago", "more than 6-12 months ago", "more than 1-2 years ago", "more than 2 years ago") and seizure frequency ("innumerable", "multiple", "daily", "weekly", "monthly", "once per year", "less than once per year"). Our ground truth consisted of structured questionnaires filled out by physicians. Model performance was measured using the areas under the receiving operating characteristic curve (AUROC) and precision recall curve (AUPRC) for categorical labels, and median absolute error (MAE) for ordinal labels, with 95 % confidence intervals (CI) estimated via bootstrapping. RESULTS: Our cohort included 1773 adult patients with a total of 5658 visits with reported seizure control metrics, seen in epilepsy clinics between December 2018 and May 2022. The cohort average age was 42 years old, the majority were female (57 %), White (81 %) and non-Hispanic (85 %). The models achieved an MAE (95 % CI) for date of last seizure of 4 (4.00-4.86) weeks, and for seizure frequency of 0.02 (0.02-0.02) seizures per day. CONCLUSIONS: Our NLP approach demonstrates that the extraction of seizure control metrics from EHR is feasible allowing for large-scale EHR research.

9.
Semin Vasc Surg ; 37(3): 314-320, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39277347

RESUMEN

Natural language processing is a subfield of artificial intelligence that aims to analyze human oral or written language. The development of large language models has brought innovative perspectives in medicine, including the potential use of chatbots and virtual assistants. Nevertheless, the benefits and pitfalls of such technology need to be carefully evaluated before their use in health care. The aim of this narrative review was to provide an overview of potential applications of large language models and artificial intelligence chatbots in the field of vascular surgery, including clinical practice, research, and education. In light of the results, we discuss current limits and future directions.


Asunto(s)
Inteligencia Artificial , Procesamiento de Lenguaje Natural , Procedimientos Quirúrgicos Vasculares , Humanos
10.
Artículo en Inglés | MEDLINE | ID: mdl-39278360

RESUMEN

BACKGROUND: The rate of diagnosis of mast cell activation syndrome (MCAS) has increased since the disorder's original description as a mastocytosis-like phenotype. While a set of consortium MCAS criteria is well described and widely accepted, this increase occurs in the setting of a broader set of proposed alternative MCAS criteria. OBJECTIVE: Effective diagnostic criteria must minimize the range of unrelated diagnoses that can be erroneously classified as the condition of interest. We sought to determine if the symptoms associated with alternative MCAS criteria result in less concise or consistent diagnostic alternatives, reducing diagnostic specificity. METHODS: We used multiple large language models, including ChatGPT, Claude, and Gemini, to bootstrap the probabilities of diagnoses that are compatible with consortium or alternative MCAS criteria. We utilized diversity and network analysis to quantify diagnostic precision and specificity compared to control diagnostic criteria including systemic lupus erythematosus (SLE), Kawasaki disease, and migraines. RESULTS: Compared to consortium MCAS criteria, alternative MCAS criteria are associated with more variable (Shannon diversity 5.8 vs. 4.6, respectively; p-value=0.004) and less precise (mean Bray-Curtis similarity 0.07 vs 0.19, respectively; p-value=0.004) diagnoses. The diagnosis networks derived from consortium and alternative MCAS criteria had lower between-network similarity compared to the similarity between diagnosis networks derived from two distinct SLE criteria (cosine similarity 0.55 vs. 0.86, respectively; p-value=0.0022). CONCLUSION: Alternative MCAS criteria are associated with a distinct set of diagnoses compared to consortium MCAS criteria and have lower diagnostic consistency. This lack of specificity is pronounced in relation to multiple control criteria, raising the concern that alternative criteria could disproportionately contribute to MCAS overdiagnosis, to the exclusion of more appropriate diagnoses.

11.
Artículo en Inglés | MEDLINE | ID: mdl-39278616

RESUMEN

OBJECTIVES: The task of writing structured content reviews and guidelines has grown stronger and more complex. We propose to go beyond search tools, toward curation tools, by automating time-consuming and repetitive steps of extracting and organizing information. METHODS: SciScribe is built as an extension of IBM's Deep Search platform, which provides document processing and search capabilities. This platform was used to ingest and search full-content publications from PubMed Central (PMC) and official, structured records from the ClinicalTrials and OpenPayments databases. Author names and NCT numbers, mentioned within the publications, were used to link publications to these official records as context. Search strategies involve traditional keyword-based search as well as natural language question and answering via large language models (LLMs). RESULTS: SciScribe is a web-based tool that helps accelerate literature reviews through key features: 1. Accumulate a personal collection from publication sources, such as PMC or other sources; 2. Incorporate contextual information from external databases into the presented papers, promoting a more informed assessment by readers. 3. Semantic question and answering of a document to quickly assess relevance and hierarchical organization. 4. Semantic question answering for each document within a collection, collated into tables. CONCLUSIONS: Emergent language processing techniques open new avenues to accelerate and enhance the literature review process, for which we have demonstrated a use case implementation within cardiac surgery. SciScribe automates and accelerates this process, mitigates errors associated with repetition and fatigue, as well as contextualizes results by linking relevant external data sources, instantaneously.

13.
Artículo en Inglés | MEDLINE | ID: mdl-39268568

RESUMEN

Artificially intelligent physical activity digital assistants that use the full spectrum of machine learning capabilities have not yet been developed and examined. This study aimed to explore potential users' perceptions and expectations of using such a digital assistant. Six 90-min online focus group meetings (n = 45 adults) were conducted. Meetings were recorded, transcribed and thematically analysed. Participants embraced the idea of a 'digital assistant' providing physical activity support. Participants indicated they would like to receive notifications from the digital assistant, but did not agree on the number, timing, tone and content of notifications. Likewise, they indicated that the digital assistant's personality and appearance should be customisable. Participants understood the need to provide information to the digital assistant to allow for personalisation, but varied greatly in the extent of information that they were willing to provide. Privacy issues aside, participants embraced the idea of using artificial intelligence or machine learning in return for a more functional and personal digital assistant. In sum, participants were ready for an artificially intelligent physical activity digital assistant but emphasised a need to personalise or customise nearly every feature of the application. This poses challenges in terms of cost and complexity of developing the application.

14.
J Biomed Inform ; 157: 104720, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39233209

RESUMEN

BACKGROUND: In oncology, electronic health records contain textual key information for the diagnosis, staging, and treatment planning of patients with cancer. However, text data processing requires a lot of time and effort, which limits the utilization of these data. Recent advances in natural language processing (NLP) technology, including large language models, can be applied to cancer research. Particularly, extracting the information required for the pathological stage from surgical pathology reports can be utilized to update cancer staging according to the latest cancer staging guidelines. OBJECTIVES: This study has two main objectives. The first objective is to evaluate the performance of extracting information from text-based surgical pathology reports and determining pathological stages based on the extracted information using fine-tuned generative language models (GLMs) for patients with lung cancer. The second objective is to determine the feasibility of utilizing relatively small GLMs for information extraction in a resource-constrained computing environment. METHODS: Lung cancer surgical pathology reports were collected from the Common Data Model database of Seoul National University Bundang Hospital (SNUBH), a tertiary hospital in Korea. We selected 42 descriptors necessary for tumor-node (TN) classification based on these reports and created a gold standard with validation by two clinical experts. The pathology reports and gold standard were used to generate prompt-response pairs for training and evaluating GLMs which then were used to extract information required for staging from pathology reports. RESULTS: We evaluated the information extraction performance of six trained models as well as their performance in TN classification using the extracted information. The Deductive Mistral-7B model, which was pre-trained with the deductive dataset, showed the best performance overall, with an exact match ratio of 92.24% in the information extraction problem and an accuracy of 0.9876 (predicting T and N classification concurrently) in classification. CONCLUSION: This study demonstrated that training GLMs with deductive datasets can improve information extraction performance, and GLMs with a relatively small number of parameters at approximately seven billion can achieve high performance in this problem. The proposed GLM-based information extraction method is expected to be useful in clinical decision-making support, lung cancer staging and research.


Asunto(s)
Neoplasias Pulmonares , Procesamiento de Lenguaje Natural , Estadificación de Neoplasias , Neoplasias Pulmonares/patología , Neoplasias Pulmonares/diagnóstico , Humanos , Estadificación de Neoplasias/métodos , Registros Electrónicos de Salud , Minería de Datos/métodos , Algoritmos , Bases de Datos Factuales
15.
Clin Imaging ; 114: 110271, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39236553

RESUMEN

The advent of large language models (LLMs) marks a transformative leap in natural language processing, offering unprecedented potential in radiology, particularly in enhancing the accuracy and efficiency of coronary artery disease (CAD) diagnosis. While previous studies have explored the capabilities of specific LLMs like ChatGPT in cardiac imaging, a comprehensive evaluation comparing multiple LLMs in the context of CAD-RADS 2.0 has been lacking. This study addresses this gap by assessing the performance of various LLMs, including ChatGPT 4, ChatGPT 4o, Claude 3 Opus, Gemini 1.5 Pro, Mistral Large, Meta Llama 3 70B, and Perplexity Pro, in answering 30 multiple-choice questions derived from the CAD-RADS 2.0 guidelines. Our findings reveal that ChatGPT 4o achieved the highest accuracy at 100 %, with ChatGPT 4 and Claude 3 Opus closely following at 96.6 %. Other models, including Mistral Large, Perplexity Pro, Meta Llama 3 70B, and Gemini 1.5 Pro, also demonstrated commendable performance, though with slightly lower accuracy ranging from 90 % to 93.3 %. This study underscores the proficiency of current LLMs in understanding and applying CAD-RADS 2.0, suggesting their potential to significantly enhance radiological reporting and patient care in coronary artery disease. The variations in model performance highlight the need for further research, particularly in evaluating the visual diagnostic capabilities of LLMs-a critical component of radiology practice. This study provides a foundational comparison of LLMs in CAD-RADS 2.0 and sets the stage for future investigations into their broader applications in radiology, emphasizing the importance of integrating both text-based and visual knowledge for optimal clinical outcomes.


Asunto(s)
Angiografía por Tomografía Computarizada , Angiografía Coronaria , Enfermedad de la Arteria Coronaria , Procesamiento de Lenguaje Natural , Humanos , Angiografía por Tomografía Computarizada/métodos , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Angiografía Coronaria/métodos , Reproducibilidad de los Resultados
16.
Am J Hum Genet ; 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39255797

RESUMEN

Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.

17.
J Clin Neurosci ; 129: 110815, 2024 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-39236407

RESUMEN

Large language models (LLM) have been promising recently in the medical field, with numerous applications in clinical neuroscience. OpenAI's launch of Generative Pre-trained Transformer 3.5 (GPT-3.5) in November 2022 and its successor, Generative Pre-trained Transformer 4 (GPT 4) in March 2023 have garnered widespread attention and debate surrounding natural language processing (NLP) and LLM advancements. Transformer models are trained on natural language datasets to predict and generate sequences of characters. Using internal weights from training, they produce tokens that align with their understanding of the initial input. This paper delves into ChatGPT's potential as a learning tool in neurosurgery while contextualizing its abilities for passing medical licensing exams and neurosurgery written boards. Additionally, possibilities for creating personalized case presentations and study material are discussed alongside ChatGPT's capacity to optimize the research workflow and perform a concise literature review. However, such tools need to be used with caution, given the possibility of artificial intelligence hallucinations and other concerns such as user overreliance, and complacency. Overall, this opinion paper raises key points surrounding ChatGPT's role in neurosurgical education.

18.
Sex Med ; 12(4): qfae055, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39257694

RESUMEN

Introduction: Despite direct access to clinicians through the electronic health record, patients are increasingly turning to the internet for information related to their health, especially with sensitive urologic conditions such as Peyronie's disease (PD). Large language model (LLM) chatbots are a form of artificial intelligence that rely on user prompts to mimic conversation, and they have shown remarkable capabilities. The conversational nature of these chatbots has the potential to answer patient questions related to PD; however, the accuracy, comprehensiveness, and readability of these LLMs related to PD remain unknown. Aims: To assess the quality and readability of information generated from 4 LLMs with searches related to PD; to see if users could improve responses; and to assess the accuracy, completeness, and readability of responses to artificial preoperative patient questions sent through the electronic health record prior to undergoing PD surgery. Methods: The National Institutes of Health's frequently asked questions related to PD were entered into 4 LLMs, unprompted and prompted. The responses were evaluated for overall quality by the previously validated DISCERN questionnaire. Accuracy and completeness of LLM responses to 11 presurgical patient messages were evaluated with previously accepted Likert scales. All evaluations were performed by 3 independent reviewers in October 2023, and all reviews were repeated in April 2024. Descriptive statistics and analysis were performed. Results: Without prompting, the quality of information was moderate across all LLMs but improved to high quality with prompting. LLMs were accurate and complete, with an average score of 5.5 of 6.0 (SD, 0.8) and 2.8 of 3.0 (SD, 0.4), respectively. The average Flesch-Kincaid reading level was grade 12.9 (SD, 2.1). Chatbots were unable to communicate at a grade 8 reading level when prompted, and their citations were appropriate only 42.5% of the time. Conclusion: LLMs may become a valuable tool for patient education for PD, but they currently rely on clinical context and appropriate prompting by humans to be useful. Unfortunately, their prerequisite reading level remains higher than that of the average patient, and their citations cannot be trusted. However, given their increasing uptake and accessibility, patients and physicians should be educated on how to interact with these LLMs to elicit the most appropriate responses. In the future, LLMs may reduce burnout by helping physicians respond to patient messages.

19.
JMIR Med Inform ; 12: e58478, 2024 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-39235317

RESUMEN

Unlabelled: With the popularization of large language models (LLMs), strategies for their effective and safe usage in health care and research have become increasingly pertinent. Despite the growing interest and eagerness among health care professionals and scientists to exploit the potential of LLMs, initial attempts may yield suboptimal results due to a lack of user experience, thus complicating the integration of artificial intelligence (AI) tools into workplace routine. Focusing on scientists and health care professionals with limited LLM experience, this viewpoint article highlights and discusses 6 easy-to-implement use cases of practical relevance. These encompass customizing translations, refining text and extracting information, generating comprehensive overviews and specialized insights, compiling ideas into cohesive narratives, crafting personalized educational materials, and facilitating intellectual sparring. Additionally, we discuss general prompting strategies and precautions for the implementation of AI tools in biomedicine. Despite various hurdles and challenges, the integration of LLMs into daily routines of physicians and researchers promises heightened workplace productivity and efficiency.

20.
JMIR Form Res ; 8: e56797, 2024 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-39265163

RESUMEN

BACKGROUND: The public launch of OpenAI's ChatGPT platform generated immediate interest in the use of large language models (LLMs). Health care institutions are now grappling with establishing policies and guidelines for the use of these technologies, yet little is known about how health care providers view LLMs in medical settings. Moreover, there are no studies assessing how pediatric providers are adopting these readily accessible tools. OBJECTIVE: The aim of this study was to determine how pediatric providers are currently using LLMs in their work as well as their interest in using a Health Insurance Portability and Accountability Act (HIPAA)-compliant version of ChatGPT in the future. METHODS: A survey instrument consisting of structured and unstructured questions was iteratively developed by a team of informaticians from various pediatric specialties. The survey was sent via Research Electronic Data Capture (REDCap) to all Boston Children's Hospital pediatric providers. Participation was voluntary and uncompensated, and all survey responses were anonymous. RESULTS: Surveys were completed by 390 pediatric providers. Approximately 50% (197/390) of respondents had used an LLM; of these, almost 75% (142/197) were already using an LLM for nonclinical work and 27% (52/195) for clinical work. Providers detailed the various ways they are currently using an LLM in their clinical and nonclinical work. Only 29% (n=105) of 362 respondents indicated that ChatGPT should be used for patient care in its present state; however, 73.8% (273/368) reported they would use a HIPAA-compliant version of ChatGPT if one were available. Providers' proposed future uses of LLMs in health care are described. CONCLUSIONS: Despite significant concerns and barriers to LLM use in health care, pediatric providers are already using LLMs at work. This study will give policy makers needed information about how providers are using LLMs clinically.


Asunto(s)
Personal de Salud , Humanos , Estudios Transversales , Personal de Salud/estadística & datos numéricos , Encuestas y Cuestionarios , Femenino , Masculino , Pediatría , Boston , Adulto , Health Insurance Portability and Accountability Act , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA