Pesquisa | Portal Regional da BVS

Analyzing hope speech from psycholinguistic and emotional perspectives.

Arif, Muhammad; Shahiki Tash, Moein; Jamshidi, Ainaz; Ullah, Fida; Ameer, Iqra; Kalita, Jugal; Gelbukh, Alexander; Balouchzahi, Fazlourrahman.

Sci Rep ; 14(1): 23548, 2024 10 09.

Artigo em Inglês | MEDLINE | ID: mdl-39384851

RESUMO

Hope is a vital coping mechanism, enabling individuals to effectively confront life's challenges. This study proposes a technique employing Natural Language Processing (NLP) tools like Linguistic Inquiry and Word Count (LIWC), NRC-emotion-lexicon, and vaderSentiment to analyze social media posts, extracting psycholinguistic, emotional, and sentimental features from a hope speech dataset. The findings of this study reveal distinct cognitive, emotional, and communicative characteristics and psycholinguistic dimensions, emotions, and sentiments associated with different types of hope shared in social media. Furthermore, the study investigates the potential of leveraging this data to classify different types of hope using machine learning algorithms. Notably, models such as LightGBM and CatBoost demonstrate impressive performance, surpassing traditional methods and competing effectively with deep learning techniques. We employed hyperparameter tuning to optimize the models' parameters and compared their performance using both default and tuned settings. The results highlight the enhanced efficiency achieved through hyperparameter tuning for these models.

Assuntos

Emoções , Processamento de Linguagem Natural , Psicolinguística , Mídias Sociais , Fala , Humanos , Emoções/fisiologia , Psicolinguística/métodos , Esperança , Aprendizado de Máquina , Algoritmos , Aprendizado Profundo

Machine learning-based guilt detection in text.

Meque, Abdul Gafar Manuel; Hussain, Nisar; Sidorov, Grigori; Gelbukh, Alexander.

Sci Rep ; 13(1): 11441, 2023 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-37454207

RESUMO

We introduce a novel Natural Language Processing (NLP) task called guilt detection, which focuses on detecting guilt in text. We identify guilt as a complex and vital emotion that has not been previously studied in NLP, and we aim to provide a more fine-grained analysis of it. To address the lack of publicly available corpora for guilt detection, we created VIC, a dataset containing 4622 texts from three existing emotion detection datasets that we binarized into guilt and no-guilt classes. We experimented with traditional machine learning methods using bag-of-words and term frequency-inverse document frequency features, achieving a 72% f1 score with the highest-performing model. Our study provides a first step towards understanding guilt in text and opens the door for future research in this area.

Assuntos

Aprendizado de Máquina , Processamento de Linguagem Natural

Multi-label emotion classification of Urdu tweets.

Ashraf, Noman; Khan, Lal; Butt, Sabur; Chang, Hsien-Tsung; Sidorov, Grigori; Gelbukh, Alexander.

PeerJ Comput Sci ; 8: e896, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35494831

RESUMO

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

Abusive language detection in youtube comments leveraging replies as conversational context.

Ashraf, Noman; Zubiaga, Arkaitz; Gelbukh, Alexander.

PeerJ Comput Sci ; 7: e742, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34712802

RESUMO

Nowadays, social media experience an increase in hostility, which leads to many people suffering from online abusive behavior and harassment. We introduce a new publicly available annotated dataset for abusive language detection in short texts. The dataset includes comments from YouTube, along with contextual information: replies, video, video title, and the original description. The comments in the dataset are labeled as abusive or not and are classified by topic: politics, religion, and other. In particular, we discuss our refined annotation guidelines for such classification. We report a number of strong baselines on this dataset for the tasks of abusive language detection and topic classification, using a number of classifiers and text representations. We show that taking into account the conversational context, namely, replies, greatly improves the classification results as compared with using only linguistic features of the comments. We also study how the classification accuracy depends on the topic of the comment.

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs.

Gómez-Adorno, Helena; Sidorov, Grigori; Pinto, David; Vilariño, Darnes; Gelbukh, Alexander.

Sensors (Basel) ; 16(9)2016 Aug 29.

Artigo em Inglês | MEDLINE | ID: mdl-27589740

RESUMO

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA