Related papers: Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

URL: http://arxiv.org/abs/2307.15072v1
Date: Wed, 12 Jul 2023 13:28:37 GMT
Title: Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning
Authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado
Abstract summary: Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models.
Score: 0.9830751917335564
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy.

Related papers

Self-Paced Learning for Images of Antinuclear Antibodies [17.410595974529457]
Antinuclear antibody (ANA) testing is a crucial method for diagnosing autoimmune disorders, including lupus, Sjgren's syndrome, and scleroderma.<n>ANA detection is complicated by over 100 coexisting antibody types, resulting in vast fluorescent pattern combinations.<n>In this paper, a novel framework for ANA detection is proposed that handles the complexities of MIML tasks using microscope images without manual preprocessing.
arXiv Detail & Related papers (2025-11-26T15:50:03Z)
A Comparative Study of Hybrid Models in Health Misinformation Text Classification [0.43695508295565777]
This study evaluates the effectiveness of machine learning (ML) and deep learning (DL) models in detecting COVID-19-related misinformation on online social networks (OSNs) Our study concludes that DL and hybrid DL models are more effective than conventional ML algorithms for detecting COVID-19 misinformation on OSNs.
arXiv Detail & Related papers (2024-10-08T19:43:37Z)
Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas. This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z)
LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models [1.0312118123538199]
This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark by 1.18%.
arXiv Detail & Related papers (2024-06-11T22:48:18Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods [0.21079694661943607]
We compare three methods for negation detection in Dutch clinical notes. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall.
arXiv Detail & Related papers (2022-09-01T14:00:13Z)
Building Brains: Subvolume Recombination for Data Augmentation in Large Vessel Occlusion Detection [56.67577446132946]
A large training data set is required for a standard deep learning-based model to learn this strategy from data. We propose an augmentation method that generates artificial training samples by recombining vessel tree segmentations of the hemispheres from different patients. In line with the augmentation scheme, we use a 3D-DenseNet fed with task-specific input, fostering a side-by-side comparison between the hemispheres.
arXiv Detail & Related papers (2022-05-05T10:31:57Z)
Misleading the Covid-19 vaccination discourse on Twitter: An exploratory study of infodemic around the pandemic [0.45593531937154413]
We collect a moderate-sized representative corpus of tweets (200,000 approx.) pertaining to Covid-19 vaccination over a period of seven months (September 2020 - March 2021) Following a Transfer Learning approach, we utilize the pre-trained Transformer-based XLNet model to classify tweets as Misleading or Non-Misleading. We build on this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones. Several ML models are employed for prediction, with up to 90% accuracy, and the importance of each feature is explained using SHAP Explainable AI (X
arXiv Detail & Related papers (2021-08-16T17:02:18Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Deep learning-based COVID-19 pneumonia classification using chest CT images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries. We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z)
CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic. The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands. We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
Utilizing Deep Learning to Identify Drug Use on Twitter Data [0.0]
The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. The synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.
arXiv Detail & Related papers (2020-03-08T07:52:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.