Búsqueda | Portal Regional de la BVS

1.

Text summarization for pharmaceutical sciences using hierarchical clustering with a weighted evaluation methodology.

Dalal, Avinash; Ranjan, Sumit; Bopaiah, Yajna; Chembachere, Divya; Steiger, Nick; Burns, Christopher; Daswani, Varsha.

Sci Rep ; 14(1): 20149, 2024 08 30.

Artículo en Inglés | MEDLINE | ID: mdl-39209906

RESUMEN

In the pharmaceutical industry, there is an abundance of regulatory documents used to understand the current regulatory landscape and proactively make project decisions. Due to the size of these documents, it is helpful for project teams to have informative summaries. We propose a novel solution, MedicoVerse, to summarize such documents using advanced machine learning techniques. MedicoVerse uses a multi-stage approach, combining word embeddings using the SapBERT model on regulatory documents. These embeddings are put through a critical hierarchical agglomerative clustering step, and the clusters are organized through a custom data structure. Each cluster is summarized using the bart-large-cnn-samsum model, and each summary is merged to create a comprehensive summary of the original document. We compare MedicoVerse results with established models T5, Google Pegasus, Facebook BART, and large language models such as Mixtral 8 × 7b instruct, GPT 3.5, and Llama-2-70b by introducing a scoring system that considers four factors: ROUGE score, BERTScore, business entities and the Flesch Reading Ease. Our results show that MedicoVerse outperforms the compared models, thus producing informative summaries of large regulatory documents.

Asunto(s)

Aprendizaje Automático , Análisis por Conglomerados , Industria Farmacéutica , Humanos , Procesamiento de Lenguaje Natural

2.

Enhanced ICD-10 code assignment of clinical texts: A summarization-based approach.

Sun, Yaoqian; Sang, Lei; Wu, Dan; He, Shilin; Chen, Yani; Duan, Huilong; Chen, Han; Lu, Xudong.

Artif Intell Med ; 156: 102967, 2024 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-39208710

RESUMEN

BACKGROUND: Assigning International Classification of Diseases (ICD) codes to clinical texts is a common and crucial practice in patient classification, hospital management, and further statistics analysis. Current auto-coding methods mainly transfer this task to a multi-label classification problem. Such solutions are suffering from high-dimensional mapping space and excessive redundant information in long clinical texts. To alleviate such a situation, we introduce text summarization methods to the ICD coding regime and apply text matching to select ICD codes. METHOD: We focus on the tenth revision of the ICD (ICD-10) coding and design a novel summarization-based approach (SuM) with an end-to-end strategy to efficiently assign ICD-10 code to clinical texts. In this approach, a knowledge-guided pointer network is purposed to distill and summarize key information in clinical texts precisely. Then a matching model with matching-aggregation architecture follows to align the summary result with code, tuning the one-vs-all scenario to one-vs-one matching so that the large-label-space obstacle laid in classification approaches would be avoided. RESULT: The 12,788 ICD-10 coded discharge summaries from a Chinese hospital were collected to evaluate the proposed approach. Compared with existing methods, the purposed model achieves the greatest coding results with Micro AUC of 0.9548, MRR@10 of 0.7977, Precision@10 of 0.0944, and Recall@10 of 0.9439 for the TOP-50 Dataset. Results on the FULL-Dataset remain consistent. Also, the proposed knowledge encoder and applied end-to-end strategy are proven to facilitate the whole model to gain efficacy in selecting the most suitable code. CONCLUSION: The proposed automatic ICD-10 code assignment approach via text summarization can effectively capture critical messages in long clinical texts and improve the performance of ICD-10 coding of clinical texts.

Asunto(s)

Clasificación Internacional de Enfermedades , Humanos , Registros Electrónicos de Salud , Codificación Clínica/métodos

3.

Leveraging artificial intelligence to summarize abstracts in lay language for increasing research accessibility and transparency.

Shyr, Cathy; Grout, Randall W; Kennedy, Nan; Akdas, Yasemin; Tischbein, Maeve; Milford, Joshua; Tan, Jason; Quarles, Kaysi; Edwards, Terri L; Novak, Laurie L; White, Jules; Wilkins, Consuelo H; Harris, Paul A.

J Am Med Inform Assoc ; 31(10): 2294-2303, 2024 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-39008829

RESUMEN

OBJECTIVE: Returning aggregate study results is an important ethical responsibility to promote trust and inform decision making, but the practice of providing results to a lay audience is not widely adopted. Barriers include significant cost and time required to develop lay summaries and scarce infrastructure necessary for returning them to the public. Our study aims to generate, evaluate, and implement ChatGPT 4 lay summaries of scientific abstracts on a national clinical study recruitment platform, ResearchMatch, to facilitate timely and cost-effective return of study results at scale. MATERIALS AND METHODS: We engineered prompts to summarize abstracts at a literacy level accessible to the public, prioritizing succinctness, clarity, and practical relevance. Researchers and volunteers assessed ChatGPT-generated lay summaries across five dimensions: accuracy, relevance, accessibility, transparency, and harmfulness. We used precision analysis and adaptive random sampling to determine the optimal number of summaries for evaluation, ensuring high statistical precision. RESULTS: ChatGPT achieved 95.9% (95% CI, 92.1-97.9) accuracy and 96.2% (92.4-98.1) relevance across 192 summary sentences from 33 abstracts based on researcher review. 85.3% (69.9-93.6) of 34 volunteers perceived ChatGPT-generated summaries as more accessible and 73.5% (56.9-85.4) more transparent than the original abstract. None of the summaries were deemed harmful. We expanded ResearchMatch's technical infrastructure to automatically generate and display lay summaries for over 750 published studies that resulted from the platform's recruitment mechanism. DISCUSSION AND CONCLUSION: Implementing AI-generated lay summaries on ResearchMatch demonstrates the potential of a scalable framework generalizable to broader platforms for enhancing research accessibility and transparency.

Asunto(s)

Indización y Redacción de Resúmenes , Inteligencia Artificial , Humanos , Investigación Biomédica , Difusión de la Información

4.

Clinical research text summarization method based on fusion of domain knowledge.

Jiang, Shiwei; Zheng, Qingxiao; Li, Taiyong; Luo, Shuanghong.

J Biomed Inform ; 156: 104668, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38857737

RESUMEN

OBJECTIVE: The objective of this study is to integrate PICO knowledge into the clinical research text summarization process, aiming to enhance the model's comprehension of biomedical texts while capturing crucial content from the perspective of summary readers, ultimately improving the quality of summaries. METHODS: We propose a clinical research text summarization method called DKGE-PEGASUS (Domain-Knowledge and Graph Convolutional Enhanced PEGASUS), which is based on integrating domain knowledge. The model mainly consists of three components: a PICO label prediction module, a text information re-mining unit based on Graph Convolutional Neural Networks (GCN), and a pre-trained summarization model. First, the PICO label prediction module is used to identify PICO elements in clinical research texts while obtaining word embeddings enriched with PICO knowledge. Then, we use GCN to reinforce the encoder of the pre-trained summarization model to achieve deeper text information mining while explicitly injecting PICO knowledge. Finally, the outputs of the PICO label prediction module, the GCN text information re-mining unit, and the encoder of the pre-trained model are fused to produce the final coding results, which are then decoded by the decoder to generate summaries. RESULTS: Experiments conducted on two datasets, PubMed and CDSR, demonstrated the effectiveness of our method. The Rouge-1 scores achieved were 42.64 and 38.57, respectively. Furthermore, the quality of our summarization results was found to significantly outperform the baseline model in comparisons of summarization results for a segment of biomedical text. CONCLUSION: The method proposed in this paper is better equipped to identify critical elements in clinical research texts and produce a higher-quality summary.

Asunto(s)

Investigación Biomédica , Minería de Datos , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , Minería de Datos/métodos , Investigación Biomédica/métodos , Humanos , Algoritmos

5.

RefAI: a GPT-powered retrieval-augmented generative tool for biomedical literature recommendation and summarization.

Li, Yiming; Zhao, Jeff; Li, Manqi; Dang, Yifang; Yu, Evan; Li, Jianfu; Sun, Zenan; Hussein, Usama; Wen, Jianguo; Abdelhameed, Ahmed M; Mai, Junhua; Li, Shenduo; Yu, Yue; Hu, Xinyue; Yang, Daowei; Feng, Jingna; Li, Zehan; He, Jianping; Tao, Wei; Duan, Tiehang; Lou, Yanyan; Li, Fang; Tao, Cui.

J Am Med Inform Assoc ; 31(9): 2030-2039, 2024 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-38857454

RESUMEN

OBJECTIVES: Precise literature recommendation and summarization are crucial for biomedical professionals. While the latest iteration of generative pretrained transformer (GPT) incorporates 2 distinct modes-real-time search and pretrained model utilization-it encounters challenges in dealing with these tasks. Specifically, the real-time search can pinpoint some relevant articles but occasionally provides fabricated papers, whereas the pretrained model excels in generating well-structured summaries but struggles to cite specific sources. In response, this study introduces RefAI, an innovative retrieval-augmented generative tool designed to synergize the strengths of large language models (LLMs) while overcoming their limitations. MATERIALS AND METHODS: RefAI utilized PubMed for systematic literature retrieval, employed a novel multivariable algorithm for article recommendation, and leveraged GPT-4 turbo for summarization. Ten queries under 2 prevalent topics ("cancer immunotherapy and target therapy" and "LLMs in medicine") were chosen as use cases and 3 established counterparts (ChatGPT-4, ScholarAI, and Gemini) as our baselines. The evaluation was conducted by 10 domain experts through standard statistical analyses for performance comparison. RESULTS: The overall performance of RefAI surpassed that of the baselines across 5 evaluated dimensions-relevance and quality for literature recommendation, accuracy, comprehensiveness, and reference integration for summarization, with the majority exhibiting statistically significant improvements (P-values <.05). DISCUSSION: RefAI demonstrated substantial improvements in literature recommendation and summarization over existing tools, addressing issues like fabricated papers, metadata inaccuracies, restricted recommendations, and poor reference integration. CONCLUSION: By augmenting LLM with external resources and a novel ranking algorithm, RefAI is uniquely capable of recommending high-quality literature and generating well-structured summaries, holding the potential to meet the critical needs of biomedical professionals in navigating and synthesizing vast amounts of scientific literature.

Asunto(s)

Algoritmos , Almacenamiento y Recuperación de la Información , PubMed , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural

6.

Text summarization with ChatGPT for drug labeling documents.

Ying, Lan; Liu, Zhichao; Fang, Hong; Kusko, Rebecca; Wu, Leihong; Harris, Stephen; Tong, Weida.

Drug Discov Today ; 29(6): 104018, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38723763

RESUMEN

Text summarization is crucial in scientific research, drug discovery and development, regulatory review, and more. This task demands domain expertise, language proficiency, semantic prowess, and conceptual skill. The recent advent of large language models (LLMs), such as ChatGPT, offers unprecedented opportunities to automate this process. We compared ChatGPT-generated summaries with those produced by human experts using FDA drug labeling documents. The labeling contains summaries of key labeling sections, making them an ideal human benchmark to evaluate ChatGPT's summarization capabilities. Analyzing >14000 summaries, we observed that ChatGPT-generated summaries closely resembled those generated by human experts. Importantly, ChatGPT exhibited even greater similarity when summarizing drug safety information. These findings highlight ChatGPT's potential to accelerate work in critical areas, including drug safety.

Asunto(s)

Etiquetado de Medicamentos , United States Food and Drug Administration , Humanos , Estados Unidos , Procesamiento de Lenguaje Natural , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos

7.

Using large language models for safety-related table summarization in clinical study reports.

Landman, Rogier; Healey, Sean P; Loprinzo, Vittorio; Kochendoerfer, Ulrike; Winnier, Angela Russell; Henstock, Peter V; Lin, Wenyi; Chen, Aqiu; Rajendran, Arthi; Penshanwar, Sushant; Khan, Sheraz; Madhavan, Subha.

JAMIA Open ; 7(2): ooae043, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38818116

RESUMEN

Objectives: The generation of structured documents for clinical trials is a promising application of large language models (LLMs). We share opportunities, insights, and challenges from a competitive challenge that used LLMs for automating clinical trial documentation. Materials and Methods: As part of a challenge initiated by Pfizer (organizer), several teams (participant) created a pilot for generating summaries of safety tables for clinical study reports (CSRs). Our evaluation framework used automated metrics and expert reviews to assess the quality of AI-generated documents. Results: The comparative analysis revealed differences in performance across solutions, particularly in factual accuracy and lean writing. Most participants employed prompt engineering with generative pre-trained transformer (GPT) models. Discussion: We discuss areas for improvement, including better ingestion of tables, addition of context and fine-tuning. Conclusion: The challenge results demonstrate the potential of LLMs in automating table summarization in CSRs while also revealing the importance of human involvement and continued research to optimize this technology.

8.

Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques.

Cabello-Collado, Celia; Rodriguez-Juan, Javier; Ortiz-Perez, David; Garcia-Rodriguez, Jose; Tomás, David; Vizcaya-Moreno, Maria Flores.

Sensors (Basel) ; 24(9)2024 Apr 25.

Artículo en Inglés | MEDLINE | ID: mdl-38732857

RESUMEN

This study presents a pioneering approach that leverages advanced sensing technologies and data processing techniques to enhance the process of clinical documentation generation during medical consultations. By employing sophisticated sensors to capture and interpret various cues such as speech patterns, intonations, or pauses, the system aims to accurately perceive and understand patient-doctor interactions in real time. This sensing capability allows for the automation of transcription and summarization tasks, facilitating the creation of concise and informative clinical documents. Through the integration of automatic speech recognition sensors, spoken dialogue is seamlessly converted into text, enabling efficient data capture. Additionally, deep models such as Transformer models are utilized to extract and analyze crucial information from the dialogue, ensuring that the generated summaries encapsulate the essence of the consultations accurately. Despite encountering challenges during development, experimentation with these sensing technologies has yielded promising results. The system achieved a maximum ROUGE-1 metric score of 0.57, demonstrating its effectiveness in summarizing complex medical discussions. This sensor-based approach aims to alleviate the administrative burden on healthcare professionals by automating documentation tasks and safeguarding important patient information. Ultimately, by enhancing the efficiency and reliability of clinical documentation, this innovative method contributes to improving overall healthcare outcomes.

Asunto(s)

Aprendizaje Profundo , Humanos , Software de Reconocimiento del Habla

9.

Biomedical semantic text summarizer.

Kirmani, Mahira; Kour, Gagandeep; Mohd, Mudasir; Sheikh, Nasrullah; Khan, Dawood Ashraf; Maqbool, Zahid; Wani, Mohsin Altaf; Wani, Abid Hussain.

BMC Bioinformatics ; 25(1): 152, 2024 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-38627652

RESUMEN

BACKGROUND: Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics. RESULTS: This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models. We evaluate our approach using ROUGE on a standard dataset and compare it with three state-of-the-art summarizers. Our results show that our approach outperforms existing summarizers. CONCLUSION: The usage of semantics can improve summarizer performance and lead to better summaries. Our summarizer has the potential to aid in efficient data analysis and information retrieval in the field of biomedical research.

Asunto(s)

Algoritmos , Investigación Biomédica , Semántica , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje Natural

10.

Automatic generation of conclusions from neuroradiology MRI reports through natural language processing.

López-Úbeda, Pilar; Martín-Noguerol, Teodoro; Escartín, Jorge; Luna, Antonio.

Neuroradiology ; 66(4): 477-485, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38381144

RESUMEN

PURPOSE: The conclusion section of a radiology report is crucial for summarizing the primary radiological findings in natural language and essential for communicating results to clinicians. However, creating these summaries is time-consuming, repetitive, and prone to variability and errors among different radiologists. To address these issues, we evaluated a fine-tuned Text-To-Text Transfer Transformer (T5) model for abstractive summarization to automatically generate conclusions for neuroradiology MRI reports in a low-resource language. METHODS: We retrospectively applied our method to a dataset of 232,425 neuroradiology MRI reports in Spanish. We compared various pre-trained T5 models, including multilingual T5 and those newly adapted for Spanish. For precise evaluation, we employed BLEU, METEOR, ROUGE-L, CIDEr, and cosine similarity metrics alongside expert radiologist assessments. RESULTS: The findings are promising, with the models specifically fine-tuned for neuroradiology MRI achieving scores of 0.46, 0.28, 0.52, 2.45, and 0.87 in the BLEU-1, METEOR, ROUGE-L, CIDEr, and cosine similarity metrics, respectively. In the radiological experts' evaluation, they found that in 75% of the cases evaluated, the conclusions generated by the system were as good as or even better than the manually generated conclusions. CONCLUSION: The methods demonstrate the potential and effectiveness of customizing state-of-the-art pre-trained models for neuroradiology, yielding automatic MRI report conclusions that nearly match expert quality. Furthermore, these results underscore the importance of designing and pre-training a dedicated language model for radiology report summarization.

Asunto(s)

Procesamiento de Lenguaje Natural , Radiología , Humanos , Estudios Retrospectivos , Lenguaje , Imagen por Resonancia Magnética

11.

Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden.

Sezgin, Emre; Sirrianni, Joseph; Kranz, Kelly.

medRxiv ; 2023 Dec 07.

Artículo en Inglés | MEDLINE | ID: mdl-38106162

RESUMEN

Objective: We present a proof-of-concept digital scribe system as an ED clinical conversation summarization pipeline and report its performance. Materials and Methods: We use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries. Results: The fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNN's performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate. Discussion: The BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the model's performance, particularly in achieving consistent correctness, suggesting room for refinement. The model's recall ability varies across different information categories. Conclusion: The study provides evidence towards the potential of AI-assisted tools in reducing clinical documentation burden. Future work is suggested on expanding the research scope with larger language models, and comparative analysis to measure documentation efforts and time.

12.

Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting.

Shi, Yiwen; Ren, Ping; Wang, Jing; Han, Biao; ValizadehAslani, Taha; Agbavor, Felix; Zhang, Yi; Hu, Meng; Zhao, Liang; Liang, Hualou.

J Biomed Inform ; 148: 104533, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37918623

RESUMEN

Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment, which provides the basis of recommendations for fasting and fed bioequivalence studies to guide the pharmaceutical industry for developing generic drug products. However, manual summarization of food effect from extensive drug application review documents is time-consuming. Therefore, there is a need to develop automated methods to generate food effect summary. Recent advances in natural language processing (NLP), particularly large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability with regard to the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach,iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the iterative prompting process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. Taken together, these results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of the PSG assessment cycle and promoting generic drug product development.

Asunto(s)

Benchmarking , Medicamentos Genéricos , Lenguaje , Procesamiento de Lenguaje Natural

13.

Adapting Static and Contextual Representations for Policy Gradient-Based Summarization.

Lin, Ching-Sheng; Jwo, Jung-Sing; Lee, Cheng-Hsiung.

Sensors (Basel) ; 23(9)2023 May 05.

Artículo en Inglés | MEDLINE | ID: mdl-37177717

RESUMEN

Considering the ever-growing volume of electronic documents made available in our daily lives, the need for an efficient tool to capture their gist increases as well. Automatic text summarization, which is a process of shortening long text and extracting valuable information, has been of great interest for decades. Due to the difficulties of semantic understanding and the requirement of large training data, the development of this research field is still challenging and worth investigating. In this paper, we propose an automated text summarization approach with the adaptation of static and contextual representations based on an extractive approach to address the research gaps. To better obtain the semantic expression of the given text, we explore the combination of static embeddings from GloVe (Global Vectors) and the contextual embeddings from BERT (Bidirectional Encoder Representations from Transformer) and GPT (Generative Pre-trained Transformer) based models. In order to reduce human annotation costs, we employ policy gradient reinforcement learning to perform unsupervised training. We conduct empirical studies on the public dataset, Gigaword. The experimental results show that our approach achieves promising performance and is competitive with various state-of-the-art approaches.

14.

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.

Banerjee, Sumanta; Mukherjee, Shyamapada; Bandyopadhyay, Sivaji.

Int J Inf Technol ; 15(4): 1789-1801, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37256024

RESUMEN

A COVID-19 news covers subtopics like infections, deaths, the economy, jobs, and more. The proposed method generates a news summary based on the subtopics of a reader's interest. It extracts a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them. The centroid is then used as a query in the vector space model (VSM) for sentence classification and extraction, producing a query focused summarization (QFS) of the documents. Three approaches, TF-IDF, word vector averaging, and auto-encoder are experimented to generate sentence embedding that are used in VSM. These embeddings are ranked depending on their similarities with the query embedding. A Novel approach has been introduced to find the value for the similarity parameter using a supervised technique to classify the sentences. Finally, the performance of the method has been assessed in two different ways. All the sentences of the dataset are considered together in the first assessment and in the second, each document wise group of sentences is considered separately using fivefold cross-validation. The proposed method has achieved a minimum of 0.60 to a maximum of 0.63 mean F1 scores with the three sentence encoding approaches on the test dataset.

15.

Single document text summarization addressed with a cat swarm optimization approach.

Debnath, Dipanwita; Das, Ranjita; Pakray, Partha.

Appl Intell (Dordr) ; 53(10): 12268-12287, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36187330

RESUMEN

The availability of a tremendous amount of online information bringing about a broad interest in extracting relevant information in a compact and meaningful way, prompted the need for automatic text summarization. Hence, in the proposed system, the automated text summarization has been considered as an extractive single-document summarization problem, and a Cat Swarm Optimization (CSO) algorithm-based approach is proposed to solve it, whose objective is to generate good summaries in terms of content coverage, informative, anti-redundancy, and readability. In this work, input documents are pre-processed first. Then the cat population is initialized, where each individual (cat) in a binary vector is randomly initialized in the search space, considering the constraint. The objective function is then formulated considering different sentence quality measures. The Best Cat Memory Pool (BCMP) is initialized based on the objective function score. After that, individuals are randomly distributed for position updating to perform seeking/tracing mode operations based on the mixture ratio in each iteration. BCMP is also updated accordingly. Finally, an optimal individual is chosen to generate the summary after the last iteration. DUC-2001 and DUC-2002 data sets and ROUGE measures are used for system evaluation, and the obtained results are compared with the various state-of-the-art methods. We have achieved approximately 25% and 5% improvement on ROUGE-1 and ROUGE-2 scores on the datasets over the best existing method mentioned in this paper, revealing the proposed method's superiority. The proposed system is also evaluated considering the generational distance, CPU processing time, cohesion, and readability factor, reflecting that the system-generated summaries are readable, concise, relevant, and fast. We have also conducted a two-sample t-test, and one-way ANOVA test showing the proposed approach is statistically significant.

16.

Reaching for upper bound ROUGE score of extractive summarization methods.

Akhmetov, Iskander; Mussabayev, Rustam; Gelbukh, Alexander.

PeerJ Comput Sci ; 8: e1103, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36262160

RESUMEN

The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

17.

Increasing Women's Knowledge about HPV Using BERT Text Summarization: An Online Randomized Study.

Bitar, Hind; Babour, Amal; Nafa, Fatema; Alzamzami, Ohoud; Alismail, Sarah.

Int J Environ Res Public Health ; 19(13)2022 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-35805761

RESUMEN

Despite the availability of online educational resources about human papillomavirus (HPV), many women around the world may be prevented from obtaining the necessary knowledge about HPV. One way to mitigate the lack of HPV knowledge is the use of auto-generated text summarization tools. This study compares the level of HPV knowledge between women who read an auto-generated summary of HPV made using the BERT deep learning model and women who read a long-form text of HPV. We randomly assigned 386 women to two conditions: half read an auto-generated summary text about HPV (n = 193) and half read an original text about HPV (n = 193). We administrated measures of HPV knowledge that consisted of 29 questions. As a result, women who read the original text were more likely to correctly answer two questions on the general HPV knowledge subscale than women who read the summarized text. For the HPV testing knowledge subscale, there was a statistically significant difference in favor of women who read the original text for only one question. The final subscale, HPV vaccination knowledge questions, did not significantly differ across groups. Using BERT for text summarization has shown promising effectiveness in increasing women's knowledge and awareness about HPV while saving their time.

Asunto(s)

Alphapapillomavirus , Infecciones por Papillomavirus , Vacunas contra Papillomavirus , Neoplasias del Cuello Uterino , Femenino , Conocimientos, Actitudes y Práctica en Salud , Humanos , Papillomaviridae , Infecciones por Papillomavirus/prevención & control , Vacunas contra Papillomavirus/uso terapéutico , Aceptación de la Atención de Salud , Neoplasias del Cuello Uterino/prevención & control

18.

Graph-based abstractive biomedical text summarization.

Givchi, Azadeh; Ramezani, Reza; Baraani-Dastjerdi, Ahmad.

J Biomed Inform ; 132: 104099, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-35700914

RESUMEN

Summarization is the process of compressing a text to obtain its important informative parts. In recent years, various methods have been presented to extract important parts of textual documents to present them in a summarized form. The first challenge of these methods is to detect the concepts that well convey the main topic of the text and extract sentences that better describe these essential concepts. The second challenge is the correct interpretation of the essential concepts to generate new paraphrased sentences such that they are not exactly the same as the sentences in the main text. The first challenge has been addressed by many researchers. However, the second one is still in progress. In this study, we focus on the abstractive summarization of biomedical documents. In this regard, for the first challenge, a new method is presented based on the graph generation and frequent itemset mining for generating extractive summaries by considering the concepts within the biomedical documents. Then, to address the second challenge, a transfer learning-based method is used to generate abstractive summarizations from extractive summaries. The efficiency of the proposed solution has been evaluated by conducting several experiments over BioMed Central and NLM's PubMed datasets. The obtained results show that the proposed approach admits a better interpretation of the main concepts and sentences of biomedical documents for the abstractive summarization by obtaining the overall ROUGE of 59.60%, which, on average, is 17% better than state-of-the-art summarization techniques. The source code, datasets, and results are available in GitHub1.

Asunto(s)

Algoritmos , Semántica , Formación de Concepto , Lenguaje , Programas Informáticos

19.

A systematic review of automatic text summarization for biomedical literature and EHRs.

Wang, Mengqian; Wang, Manhua; Yu, Fei; Yang, Yue; Walker, Jennifer; Mostafa, Javed.

J Am Med Inform Assoc ; 28(10): 2287-2297, 2021 09 18.

Artículo en Inglés | MEDLINE | ID: mdl-34338801

RESUMEN

OBJECTIVE: Biomedical text summarization helps biomedical information seekers avoid information overload by reducing the length of a document while preserving the contents' essence. Our systematic review investigates the most recent biomedical text summarization researches on biomedical literature and electronic health records by analyzing their techniques, areas of application, and evaluation methods. We identify gaps and propose potential directions for future research. MATERIALS AND METHODS: This review followed the PRISMA methodology and replicated the approaches adopted by the previous systematic review published on the same topic. We searched 4 databases (PubMed, ACM Digital Library, Scopus, and Web of Science) from January 1, 2013 to April 8, 2021. Two reviewers independently screened title, abstract, and full-text for all retrieved articles. The conflicts were resolved by the third reviewer. The data extraction of the included articles was in 5 dimensions: input, purpose, output, method, and evaluation. RESULTS: Fifty-eight out of 7235 retrieved articles met the inclusion criteria. Thirty-nine systems used single-document biomedical research literature as their input, 17 systems were explicitly designed for clinical support, 47 systems generated extractive summaries, and 53 systems adopted hybrid methods combining computational linguistics, machine learning, and statistical approaches. As for the assessment, 51 studies conducted an intrinsic evaluation using predefined metrics. DISCUSSION AND CONCLUSION: This study found that current biomedical text summarization systems have achieved good performance using hybrid methods. Studies on electronic health records summarization have been increasing compared to a previous survey. However, the majority of the works still focus on summarizing literature.

Asunto(s)

Investigación Biomédica , Publicaciones , Registros Electrónicos de Salud , Aprendizaje Automático

20.

MultiGBS: A multi-layer graph approach to biomedical summarization.

Davoodijam, Ensieh; Ghadiri, Nasser; Lotfi Shahreza, Maryam; Rinaldi, Fabio.

J Biomed Inform ; 116: 103706, 2021 04.

Artículo en Inglés | MEDLINE | ID: mdl-33610879

RESUMEN

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values.

Asunto(s)

Minería de Datos , Semántica , Algoritmos , Lenguaje , Procesamiento de Lenguaje Natural

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA