Detecting the Presence of COVID-19 Vaccination Hesitancy from South
African Twitter Data Using Machine Learning
- URL: http://arxiv.org/abs/2307.15072v1
- Date: Wed, 12 Jul 2023 13:28:37 GMT
- Title: Detecting the Presence of COVID-19 Vaccination Hesitancy from South
African Twitter Data Using Machine Learning
- Authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra
Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn
Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado
- Abstract summary: Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort.
In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models.
- Score: 0.9830751917335564
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Very few social media studies have been done on South African user-generated
content during the COVID-19 pandemic and even fewer using hand-labelling over
automated methods. Vaccination is a major tool in the fight against the
pandemic, but vaccine hesitancy jeopardizes any public health effort. In this
study, sentiment analysis on South African tweets related to vaccine hesitancy
was performed, with the aim of training AI-mediated classification models and
assessing their reliability in categorizing UGC. A dataset of 30000 tweets from
South Africa were extracted and hand-labelled into one of three sentiment
classes: positive, negative, neutral. The machine learning models used were
LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their
hyperparameters were carefully chosen and tuned using the WandB platform. We
used two different approaches when we pre-processed our data for comparison:
one was semantics-based, while the other was corpus-based. The pre-processing
of the tweets in our dataset was performed using both methods, respectively.
All models were found to have low F1-scores within a range of 45$\%$-55$\%$,
except for BERT and RoBERTa which both achieved significantly better measures
with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling
using an LDA was performed on the miss-classified tweets of the RoBERTa model
to gain insight on how to further improve model accuracy.
Related papers
- A Comparative Study of Hybrid Models in Health Misinformation Text Classification [0.43695508295565777]
This study evaluates the effectiveness of machine learning (ML) and deep learning (DL) models in detecting COVID-19-related misinformation on online social networks (OSNs)
Our study concludes that DL and hybrid DL models are more effective than conventional ML algorithms for detecting COVID-19 misinformation on OSNs.
arXiv Detail & Related papers (2024-10-08T19:43:37Z) - Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models [1.0312118123538199]
This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders.
Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark by 1.18%.
arXiv Detail & Related papers (2024-06-11T22:48:18Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Negation detection in Dutch clinical texts: an evaluation of rule-based
and machine learning methods [0.21079694661943607]
We compare three methods for negation detection in Dutch clinical notes.
We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall.
arXiv Detail & Related papers (2022-09-01T14:00:13Z) - Building Brains: Subvolume Recombination for Data Augmentation in Large
Vessel Occlusion Detection [56.67577446132946]
A large training data set is required for a standard deep learning-based model to learn this strategy from data.
We propose an augmentation method that generates artificial training samples by recombining vessel tree segmentations of the hemispheres from different patients.
In line with the augmentation scheme, we use a 3D-DenseNet fed with task-specific input, fostering a side-by-side comparison between the hemispheres.
arXiv Detail & Related papers (2022-05-05T10:31:57Z) - Misleading the Covid-19 vaccination discourse on Twitter: An exploratory
study of infodemic around the pandemic [0.45593531937154413]
We collect a moderate-sized representative corpus of tweets (200,000 approx.) pertaining to Covid-19 vaccination over a period of seven months (September 2020 - March 2021)
Following a Transfer Learning approach, we utilize the pre-trained Transformer-based XLNet model to classify tweets as Misleading or Non-Misleading.
We build on this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones.
Several ML models are employed for prediction, with up to 90% accuracy, and the importance of each feature is explained using SHAP Explainable AI (X
arXiv Detail & Related papers (2021-08-16T17:02:18Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z) - Utilizing Deep Learning to Identify Drug Use on Twitter Data [0.0]
The classification power of multiple methods was compared including support vector machines (SVM), XGBoost, and convolutional neural network (CNN) based classifiers.
The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91.
The synthetically generated set provided increased scores, improving the classification capability and proving the worth of this methodology.
arXiv Detail & Related papers (2020-03-08T07:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.