Your browser doesn't support javascript.
loading
TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment inCOVID-19 Tweets
Md. Shahriare Satu; Md. Imran Khan; Mufti Mahmud; Shahadat Uddin; Matthew A Summers; Julian M. W. Quinn; Mohammad Ali Moni.
Afiliación
  • Md. Shahriare Satu; Faculty Member, Department of Management Information Systems, Noakhali Science and Technology University
  • Md. Imran Khan; Gono Bishwabidylay
  • Mufti Mahmud; Dept. of Computing & Technology, Nottingham Trent University
  • Shahadat Uddin; The University of Sydney
  • Matthew A Summers; Garvan Institute of Medical Research
  • Julian M. W. Quinn; Garvan Institute of Medical Research
  • Mohammad Ali Moni; University of New South Wales
Preprint en En | PREPRINT-MEDRXIV | ID: ppmedrxiv-20167973
Artículo de revista
Un artículo publicado en revista científica está disponible y probablemente es basado en este preprint, por medio del reconocimiento de similitud realizado por una máquina. La confirmación humana aún está pendiente.
Ver artículo de revista
ABSTRACT
COVID-19, caused by the SARS-Cov2, varies greatly in its severity but represent serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals, and remains uncertainty over key aspects of its infectivity, no effective remedy yet exists and this disease causes severe economic effects globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially impact on public opinions in some cases and exacerbate widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topics extracting model (named TClustVID) that analyze COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed to Twitter datasets which enabled exploration of the performance of traditional and TclustVID classification methods. TClustVID showed higher performance compared to the traditional classifiers determined by clustering criteria. Finally, we extracted significant topic clusters from TClustVID, split them into positive, neutral and negative clusters and implemented latent dirichlet allocation for extraction of popular COVID-19 topics. This approach identified common prevailing public opinions and concerns related to COVID-19, as well as attitudes to infection prevention strategies held by people from different countries concerning the current pandemic situation.
Licencia
cc_by_nc_nd
Texto completo: 1 Colección: 09-preprints Base de datos: PREPRINT-MEDRXIV Tipo de estudio: Prognostic_studies Idioma: En Año: 2020 Tipo del documento: Preprint
Texto completo: 1 Colección: 09-preprints Base de datos: PREPRINT-MEDRXIV Tipo de estudio: Prognostic_studies Idioma: En Año: 2020 Tipo del documento: Preprint