Búsqueda | Portal Regional de la BVS

1.

The principal components of meaning, revisited.

Westbury, Chris; Yang, Michelle; Anderson, Kris.

Psychon Bull Rev ; 2024 Aug 22.

Artículo en Inglés | MEDLINE | ID: mdl-39174751

RESUMEN

Osgood, Suci, and Tannebaum were the first to attempt to identify the principal components of semantics using dimensional reduction of a high-dimensional model of semantics constructed from human judgments of word relatedness. Modern word-embedding models analyze patterns of words to construct higher dimensional models of semantics that can be similarly subjected to dimensional reduction. Hollis and Westbury characterized the first eight principal components (PCs) of a word-embedding model by correlating them with several well-known lexical measures, such as logged word frequency, age of acquisition, valence, arousal, dominance, and concreteness. The results show some clear differentiation of interpretation between the PCs. Here, we extend this work by analyzing a larger word-embedding matrix using semantic measures initially derived from subjective inspection of the PCs. We then use quantitative analysis to confirm the utility of these subjective measures for predicting PC values and cross-validate them on two word-embedding matrices developed on distinct corpora. Several semantic and word class measures are strongly predictive of early PC values, including first-person and second-person verbs, personal relevance of abstract and concrete words, affect terms, and names of places and people. The predictors of the lowest magnitude PCs generalized well to word-embedding matrices constructed from separate corpora, including matrices constructed using different word-embedding methods. The predictive categories we describe are consistent with Wittgenstein's argument that an autonomous level of social interaction grounds linguistic meaning.

2.

Optimizing word embeddings for small dataset: a case study on patient portal messages from breast cancer patients.

Song, Qingyuan; Ni, Congning; Warner, Jeremy L; Chen, Qingxia; Song, Lijun; Rosenbloom, S Trent; Malin, Bradley A; Yin, Zhijun.

Sci Rep ; 14(1): 16117, 2024 07 12.

Artículo en Inglés | MEDLINE | ID: mdl-38997332

RESUMEN

Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks' choices between the two groups of reviewers ( p = 0.774 under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.

Asunto(s)

Neoplasias de la Mama , Procesamiento de Lenguaje Natural , Portales del Paciente , Humanos , Femenino , Semántica , Registros Electrónicos de Salud

3.

Using a hybrid neural network architecture for DNA sequence representation: A study on N⁴-methylcytosine sites.

Nguyen, Van-Nui; Ho, Trang-Thi; Doan, Thu-Dung; Le, Nguyen Quoc Khanh.

Comput Biol Med ; 178: 108664, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38875905

RESUMEN

N4-methylcytosine (4mC) is a modified form of cytosine found in DNA, contributing to epigenetic regulation. It exists in various genomes, including the Rosaceae family encompassing significant fruit crops like apples, cherries, and roses. Previous investigations have examined the distribution and functional implications of 4mC sites within the Rosaceae genome, focusing on their potential roles in gene expression regulation, environmental adaptation, and evolution. This research aims to improve the accuracy of predicting 4mC sites within the genome of Fragaria vesca, a Rosaceae plant species. Building upon the original 4mc-w2vec method, which combines word embedding processing and a convolutional neural network (CNN), we have incorporated additional feature encoding techniques and leveraged pre-trained natural language processing (NLP) models with different deep learning architectures including different forms of CNN, recurrent neural networks (RNN) and long short-term memory (LSTM). Our assessments have shown that the best model is derived from a CNN model using fastText encoding. This model demonstrates enhanced performance, achieving a sensitivity of 0.909, specificity of 0.77, and accuracy of 0.879 on an independent dataset. Furthermore, our model surpasses previously published works on the same dataset, thus showcasing its superior predictive capabilities.

Asunto(s)

Redes Neurales de la Computación , ADN de Plantas/genética , Citosina/metabolismo , Citosina/química , Genoma de Planta , Análisis de Secuencia de ADN/métodos , Metilación de ADN/genética , Fragaria/genética

4.

[A medical visual question answering approach based on co-attention networks].

Cui, Wencheng; Shi, Wentao; Shao, Hong.

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi ; 41(3): 560-568, 2024 Jun 25.

Artículo en Chino | MEDLINE | ID: mdl-38932543

RESUMEN

Recent studies have introduced attention models for medical visual question answering (MVQA). In medical research, not only is the modeling of "visual attention" crucial, but the modeling of "question attention" is equally significant. To facilitate bidirectional reasoning in the attention processes involving medical images and questions, a new MVQA architecture, named MCAN, has been proposed. This architecture incorporated a cross-modal co-attention network, FCAF, which identifies key words in questions and principal parts in images. Through a meta-learning channel attention module (MLCA), weights were adaptively assigned to each word and region, reflecting the model's focus on specific words and regions during reasoning. Additionally, this study specially designed and developed a medical domain-specific word embedding model, Med-GloVe, to further enhance the model's accuracy and practical value. Experimental results indicated that MCAN proposed in this study improved the accuracy by 7.7% on free-form questions in the Path-VQA dataset, and by 4.4% on closed-form questions in the VQA-RAD dataset, which effectively improves the accuracy of the medical vision question answer.

Asunto(s)

Redes Neurales de la Computación , Humanos , Atención , Algoritmos

5.

RB-GAT: A Text Classification Model Based on RoBERTa-BiGRU with Graph ATtention Network.

Lv, Shaoqing; Dong, Jungang; Wang, Chichi; Wang, Xuanhong; Bao, Zhiqiang.

Sensors (Basel) ; 24(11)2024 May 24.

Artículo en Inglés | MEDLINE | ID: mdl-38894157

RESUMEN

With the development of deep learning, several graph neural network (GNN)-based approaches have been utilized for text classification. However, GNNs encounter challenges when capturing contextual text information within a document sequence. To address this, a novel text classification model, RB-GAT, is proposed by combining RoBERTa-BiGRU embedding and a multi-head Graph ATtention Network (GAT). First, the pre-trained RoBERTa model is exploited to learn word and text embeddings in different contexts. Second, the Bidirectional Gated Recurrent Unit (BiGRU) is employed to capture long-term dependencies and bidirectional sentence information from the text context. Next, the multi-head graph attention network is applied to analyze this information, which serves as a node feature for the document. Finally, the classification results are generated through a Softmax layer. Experimental results on five benchmark datasets demonstrate that our method can achieve an accuracy of 71.48%, 98.45%, 80.32%, 90.84%, and 95.67% on Ohsumed, R8, MR, 20NG and R52, respectively, which is superior to the existing nine text classification approaches.

6.

Scoring alignments by embedding vector similarity.

Ashrafzadeh, Sepehr; Golding, G Brian; Ilie, Silvana; Ilie, Lucian.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38695119

RESUMEN

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.

Asunto(s)

Algoritmos , Biología Computacional , Alineación de Secuencia , Alineación de Secuencia/métodos , Biología Computacional/métodos , Programas Informáticos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizaje Profundo , Bases de Datos de Proteínas

7.

Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples.

Song, Qingyuan; Ni, Congning; Warner, Jeremy L; Chen, Qingxia; Song, Lijun; Rosenbloom, S Trent; Malin, Bradley A; Yin, Zhijun.

Res Sq ; 2024 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-38798621

RESUMEN

Background: Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. Objective: We introduce a novel adaptation of the word2vec model, PK-word2vec, for small-scale messages. Methods: PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec on patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. Results: The dataset was composed of 1,389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7,981 non-medical and 1,116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p=0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks' choices between the two groups of reviewers (p = 0.774 under a paired t-test). Conclusions: PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.

8.

How different are offline and online diplomacy? A comparative analysis of public statements and SNS posts by delegates to the United Nations.

Sakamoto, Takuto; Araki, Momoko; Ito, Hiroto; Matsuoka, Tomoyuki.

Front Big Data ; 7: 1304806, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38680474

RESUMEN

Introduction: This article investigates the evolving landscape of diplomacy in the digital age, focusing on diplomats at the United Nations (UN) Headquarters in New York. The central inquiry revolves around how diplomatic actors use digital tools to complement or augment traditional face-to-face diplomacy. Methods: We systematically compare a substantial corpus of X posts (tweets) from UN diplomats with their public statements at the United Nations Security Council (UNSC), employing advanced computational social science techniques. This study applies a range of large-scale text analysis methods, including word embedding, topic modeling, and sentiment analysis, to investigate systematic differences between offline and online communication. Results: Our analysis reveals that, while the essence of diplomacy remains consistent across both domains, there is strategic selectivity in the use of online platforms by diplomats. Online communication emphasizes non-security topics, ceremonial matters, and prominent policy stances, in contrast to the operational issues common in UNSC deliberations. Additionally, online discourse adopts a less confrontational, more public diplomacy-oriented tone, with variations among countries. Discussion: This study offers one of the first systematic comparisons between offline and online diplomatic messages. It illuminates how diplomats navigate the digital realm to complement traditional roles. The findings indicate that some elements of public diplomacy and nation branding, directed toward a wider audience far beyond the council chamber, have become an integral part of multilateral diplomacy unfolding at the UNSC.

9.

Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model.

Akbar, Shahid; Raza, Ali; Zou, Quan.

BMC Bioinformatics ; 25(1): 102, 2024 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-38454333

RESUMEN

BACKGROUND: Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. METHODS: In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. RESULTS: The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. CONCLUSION: Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.

Asunto(s)

Evolución Biológica , Péptidos , Humanos , Reproducibilidad de los Resultados , Péptidos/química , Antivirales/farmacología

10.

iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks.

Akbar, Shahid; Zou, Quan; Raza, Ali; Alarfaj, Fawaz Khaled.

Artif Intell Med ; 151: 102860, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38552379

RESUMEN

Globally, fungal infections have become a major health concern in humans. Fungal diseases generally occur due to the invading fungus appearing on a specific portion of the body and becoming hard for the human immune system to resist. The recent emergence of COVID-19 has intensely increased different nosocomial fungal infections. The existing wet-laboratory-based medications are expensive, time-consuming, and may have adverse side effects on normal cells. In the last decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction scheme called iAFPs-Mv-BiTCN to predict antifungal peptides correctly. The training peptides are encoded using word embedding methods such as skip-gram and attention mechanism-based bidirectional encoder representation using transformer. Additionally, transform-based evolutionary features are generated using the Pseduo position-specific scoring matrix using discrete wavelet transform (PsePSSM-DWT). The fused vector of word embedding and evolutionary descriptors is formed to compensate for the limitations of single encoding methods. A Shapley Additive exPlanations (SHAP) based global interpolation approach is applied to reduce training costs by choosing the optimal feature set. The selected feature set is trained using a bi-directional temporal convolutional network (BiTCN). The proposed iAFPs-Mv-BiTCN model achieved a predictive accuracy of 98.15 % and an AUC of 0.99 using training samples. In the case of the independent samples, our model obtained an accuracy of 94.11 % and an AUC of 0.98. Our iAFPs-Mv-BiTCN model outperformed existing models with a ~4 % and ~5 % higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed iAFPs-Mv-BiTCN model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.

Asunto(s)

Antifúngicos , Redes Neurales de la Computación , Antifúngicos/uso terapéutico , Humanos , Péptidos/química , COVID-19 , Micosis/microbiología , Análisis de Ondículas , Algoritmos

11.

A Richer Vocabulary of Chinese Personality Traits: Leveraging Word Embedding Technology for Mining Personality Descriptors.

Ding, Yigang; Zheng, Feijun; Xu, Linjie; Yang, Xinru; Jia, Yiyun.

J Psycholinguist Res ; 53(3): 33, 2024 Mar 25.

Artículo en Inglés | MEDLINE | ID: mdl-38526606

RESUMEN

This study uses a data-driven approach to mine the distribution of personality traits among Chinese people in the Chinese social context. Based on the hypothesis of personality lexicology, word embedding technology was employed in machine learning to mine personality vocabulary from Tencent's word embedding database. More than 10,000 Chinese personality descriptors were extracted and analyzed using Gaussian Mixture Model Cluster and Hierarchical clustering analysis. The data was collected from 658 Chinese people randomly from all parts of China through an online questionnaire method. The results reveal six personality traits in the Chinese context, expanding the personality thesaurus and providing examples to illustrate each trait. The findings coincide with previous research on the five-factor model, which partially describes the personality traits of Chinese people, but does not offer a complete explanation of their typical social behavior patterns. Additionally, the study supports the notion of cultural particularity in personality traits. The approach used in this study offers a richer personality vocabulary than traditional personality mining methods, and word embedding technology captures richer semantic information in Chinese. The six Chinese personality traits identified in this study will also be used to explore how to quantify and evaluate personality traits based on word embedding and personality descriptors.

Asunto(s)

Pueblos del Este de Asia , Personalidad , Vocabulario , Humanos , Semántica , Tecnología

12.

A new word embedding model integrated with medical knowledge for deep learning-based sentiment classification.

Khine, Aye Hninn; Wettayaprasit, Wiphada; Duangsuwan, Jarunee.

Artif Intell Med ; 148: 102758, 2024 02.

Artículo en Inglés | MEDLINE | ID: mdl-38325934

RESUMEN

The development of intelligent systems that use social media data for decision-making processes in numerous domains such as politics, business, marketing, and finance, has been made possible by the popularity of social media platforms. However, the utilization of textual data from social media in the healthcare management industry is still somewhat limited when it is compared to other industries. Investigating how current machine learning and natural language processing technologies can be used in the healthcare industry to gauge public sentiment is an important study. Earlier works on healthcare sentiment analysis have utilized traditional word embedding models trained on the general and medical corpus. However, integration of medical knowledge to pre-trained word embedding models has not been considered yet. Word embedding models trained on the general corpus led to the problem of lacking medical knowledge and the models trained on the small size of the medical corpus have limitations in capturing semantic and syntactic properties. This research proposes a new word embedding model named Word Embedding Integrated with Medical Knowledge Vector (WE-iMKVec). The proposed model integrates sentiment lexicons and medical knowledgebases into the pre-trained word embedding to enrich the properties of word embedding. A new medical-aware sentiment polarity score is proposed for the utilization in learning neural-network sentiment and these vectors incorporate with the original pre-trained word vectors. The resulting vectors are enriched with lexicon vectors and the medical knowledge vectors: Adverse Drug Reaction (ADR) vector and Unified Medical Language System (UMLS) vector are used to build the proposed WE-iMKVec model. WE-iMKVec is validated on the five different social media healthcare review datasets and the empirical results showed its superiority over traditional word embedding models in medical sentiment analysis. The highest improvement can be found in the patients.info medical condition dataset where the proposed model outperforms three conventional word2vec models (Google-News, PubMed-PMC, and Drug Reviews) by 12.7 %, 31.4 %, and 25.4 % respectively in terms of F1 score.

Asunto(s)

Aprendizaje Profundo , Análisis de Sentimientos , Humanos , Redes Neurales de la Computación , Aprendizaje Automático , Procesamiento de Lenguaje Natural

13.

Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis.

Gu, Dongxiao; Wang, Qin; Chai, Yidong; Yang, Xuejie; Zhao, Wang; Li, Min; Zolotarev, Oleg; Xu, Zhengfei; Zhang, Gongrang.

J Med Internet Res ; 26: e48324, 2024 Feb 22.

Artículo en Inglés | MEDLINE | ID: mdl-38386404

RESUMEN

BACKGROUND: Allergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information. OBJECTIVE: This study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR? METHODS: This study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR. RESULTS: Our classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body's immune system and lead to the development of allergies. CONCLUSIONS: Our approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.

Asunto(s)

Rinitis Alérgica , Humanos , Análisis por Conglomerados , Inteligencia , Recuerdo Mental , Factores de Riesgo

14.

EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features.

Zhuang, Jujuan; Gao, Wanquan; Su, Rui.

J Bioinform Comput Biol ; 22(1): 2450001, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38406833

RESUMEN

Antimicrobial peptides (AMPs), as the preferred alternatives to antibiotics, have wide application with good prospects. Identifying AMPs through wet lab experiments remains expensive, time-consuming and challenging. Many machine learning methods have been proposed to predict AMPs and achieved good results. In this work, we combine two kinds of word embedding features with the statistical features of peptide sequences to develop an ensemble classifier, named EnAMP, in which, two deep neural networks are trained based on Word2vec and Glove word embedding features of peptide sequences, respectively, meanwhile, we utilize statistical features of peptide sequences to train random forest and support vector machine classifiers. The average of four classifiers is the final prediction result. Compared with other state-of-the-art algorithms on six datasets, EnAMP outperforms most existing models with similar computational costs, even when compared with high computational cost algorithms based on Bidirectional Encoder Representation from Transformers (BERT), the performance of our model is comparable. EnAMP source code and the data are available at https://github.com/ruisue/EnAMP.

Asunto(s)

Aprendizaje Profundo , Algoritmos , Redes Neurales de la Computación , Antibacterianos/farmacología , Péptidos

15.

Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study.

Vithanage, Dinithi; Yu, Ping; Wang, Lei; Deng, Chao.

J Healthc Inform Res ; 8(1): 158-179, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38273979

RESUMEN

Recent advancements in natural language processing (NLP), particularly contextual word embedding models, have improved knowledge extraction from biomedical and healthcare texts. However, limited comprehensive research compares these models. This study conducts a scoping review and compares the performance of the major contextual word embedding models for biomedical knowledge extraction. From 26 articles identified from Scopus, PubMed, PubMed Central, and Google Scholar between 2017 and 2021, 18 notable contextual word embedding models were identified. These include ELMo, BERT, BioBERT, BlueBERT, CancerBERT, DDS-BERT, RuBERT, LABSE, EhrBERT, MedBERT, Clinical BERT, Clinical BioBERT, Discharge Summary BERT, Discharge Summary BioBERT, GPT, GPT-2, GPT-3, and GPT2-Bio-Pt. A case study compared the performance of six representative models-ELMo, BERT, BioBERT, BlueBERT, Clinical BioBERT, and GPT-3-across text classification, named entity recognition, and question answering. The evaluation utilized datasets comprising biomedical text from tweets, NCBI, PubMed, and clinical notes sourced from two electronic health record datasets. Performance metrics, including accuracy and F1 score, were used. The results of this case study reveal that BioBERT performs the best in analyzing biomedical text, while Clinical BioBERT excels in analyzing clinical notes. These findings offer crucial insights into word embedding models for researchers, practitioners, and stakeholders utilizing NLP in biomedical and clinical document analysis. Supplementary Information: The online version contains supplementary material available at 10.1007/s41666-023-00157-y.

16.

Surfing the OCEAN: The machine learning psycholexical approach 2.0 to detect personality traits in texts.

Giannini, Federico; Marelli, Marco; Stella, Fabio; Monzani, Dario; Pancani, Luca.

J Pers ; 2024 Jan 13.

Artículo en Inglés | MEDLINE | ID: mdl-38217359

RESUMEN

OBJECTIVE: We aimed to develop a machine learning model to infer OCEAN traits from text. BACKGROUND: The psycholexical approach allows retrieving information about personality traits from human language. However, it has rarely been applied because of methodological and practical issues that current computational advancements could overcome. METHOD: Classical taxonomies and a large Yelp corpus were leveraged to learn an embedding for each personality trait. These embeddings were used to train a feedforward neural network for predicting trait values. Their generalization performances have been evaluated through two external validation studies involving experts (N = 11) and laypeople (N = 100) in a discrimination task about the best markers of each trait and polarity. RESULTS: Intrinsic validation of the model yielded excellent results, with R2 values greater than 0.78. The validation studies showed a high proportion of matches between participants' choices and model predictions, confirming its efficacy in identifying new terms related to the OCEAN traits. The best performance was observed for agreeableness and extraversion, especially for their positive polarities. The model was less efficient in identifying the negative polarity of openness and conscientiousness. CONCLUSIONS: This innovative methodology can be considered a "psycholexical approach 2.0," contributing to research in personality and its practical applications in many fields.

17.

TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites.

Zhang, Shengli; Xu, Yujie; Liang, Yunyun.

Comput Struct Biotechnol J ; 23: 129-139, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-38089465

RESUMEN

RNA N7-methylguanosine (m7G) is a crucial chemical modification of RNA molecules, whose principal duty is to maintain RNA function and protein translation. Studying and predicting RNA N7-methylguanosine sites aid in comprehending the biological function of RNA and the development of new drug therapy regimens. In the present scenario, the efficacy of techniques, specifically deep learning and machine learning, stands out in the prediction of RNA N7-methylguanosine sites, leading to improved accuracy and identification efficiency. In this study, we propose a model leveraging the transformer framework that integrates natural language processing and deep learning to predict m7G sites, called TMSC-m7G. In TMSC-m7G, a combination of multi-sense-scaled token embedding and fixed-position embedding is used to replace traditional word embedding for the extraction of contextual information from sequences. Moreover, a convolutional layer is added in the encoder to make up for the shortage of local information acquisition in transformer. The model's robustness and generalization are validated through 10-fold cross-validation and an independent dataset test. Results demonstrate outstanding performance in comparison to the most advanced models available. Among them, the Accuracy of TMSC-m7G reaches 98.70% and 92.92% on the benchmark dataset and independent dataset, respectively. To facilitate the popularization and use of the model, we have developed an intuitive online prediction tool, which is easily accessible for free at http://39.105.212.81/.

18.

Improving sentiment classification using a RoBERTa-based hybrid model.

Semary, Noura A; Ahmed, Wesam; Amin, Khalid; Plawiak, Pawel; Hammad, Mohamed.

Front Hum Neurosci ; 17: 1292010, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38130432

RESUMEN

Introduction: Several attempts have been made to enhance text-based sentiment analysis's performance. The classifiers and word embedding models have been among the most prominent attempts. This work aims to develop a hybrid deep learning approach that combines the advantages of transformer models and sequence models with the elimination of sequence models' shortcomings. Methods: In this paper, we present a hybrid model based on the transformer model and deep learning models to enhance sentiment classification process. Robustly optimized BERT (RoBERTa) was selected for the representative vectors of the input sentences and the Long Short-Term Memory (LSTM) model in conjunction with the Convolutional Neural Networks (CNN) model was used to improve the suggested model's ability to comprehend the semantics and context of each input sentence. We tested the proposed model with two datasets with different topics. The first dataset is a Twitter review of US airlines and the second is the IMDb movie reviews dataset. We propose using word embeddings in conjunction with the SMOTE technique to overcome the challenge of imbalanced classes of the Twitter dataset. Results: With an accuracy of 96.28% on the IMDb reviews dataset and 94.2% on the Twitter reviews dataset, the hybrid model that has been suggested outperforms the standard methods. Discussion: It is clear from these results that the proposed hybrid RoBERTa-(CNN+ LSTM) method is an effective model in sentiment classification.

19.

EnILs: A General Ensemble Computational Approach for Predicting Inducing Peptides of Multiple Interleukins.

Su, Rui; Zhuang, Jujuan; Liu, Shuhan; Liu, Di; Feng, Kexin.

J Comput Biol ; 30(12): 1289-1304, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-38010531

RESUMEN

Interleukins (ILs) are a group of multifunctional cytokines, which play important roles in immune regulations and inflammatory responses. Recently, IL-6 has been found to affect the development of COVID-19, and significantly elevated levels of IL-6 cytokines have been reported in patients with severe COVID-19. IL-10 and IL-17 are anti-inflammatory and proinflammatory cytokines, respectively, which play multiple protective roles in host defense against pathogens. At present, a number of machine learning methods have been proposed to predict ILs inducing peptides, but their predictive performance needs to be further improved, and the inducing peptides of different ILs are predicted separately, rather than using a general approach. In our work, we combine the statistical features of peptide sequence with word embedding to design a general ensemble model named EnILs to predict inducing peptides of different ILs, in which the predictive probabilities of random forest, eXtreme Gradient Boosting and neural network are integrated in an average way. Compared with the state-of-the-art machine learning methods, EnILs shows considerable performance in the prediction of IL-6, IL-10, and IL-17 inducing peptides. In addition, we predict the most promising IL-6 inducing peptides in Severe Acute Respiratory Syndrome Coronavirus 2 spike protein in the case study for further experimental verification.

Asunto(s)

COVID-19 , Interleucina-17 , Humanos , Interleucina-10 , Interleucina-6 , Interleucinas/metabolismo , Péptidos , Citocinas

20.

Evaluating the use of Instagram images color histograms and hashtags sets for automatic image annotation.

Giannoulakis, Stamatios; Tsapatsoulis, Nicolas; Djouvas, Constantinos.

Front Big Data ; 6: 1149523, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37469440

RESUMEN

Color similarity has been a key feature for content-based image retrieval by contemporary search engines, such as Google. In this study, we compare the visual content information of images, obtained through color histograms, with their corresponding hashtag sets in the case of Instagram posts. In previous studies, we had concluded that less than 25% of Instagram hashtags are related to the actual visual content of the image they accompany. Thus, the use of Instagram images' corresponding hashtags for automatic image annotation is questionable. In this study, we are answering this question through the computational comparison of images' low-level characteristics with the semantic and syntactic information of their corresponding hashtags. The main conclusion of our study on 26 different subjects (concepts) is that color histograms and filtered hashtag sets, although related, should be better seen as a complementary source for image retrieval and automatic image annotation.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA