COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web
Application for Classifying COVID-19 Discussions
- URL: http://arxiv.org/abs/2402.09897v1
- Date: Thu, 15 Feb 2024 11:45:34 GMT
- Title: COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web
Application for Classifying COVID-19 Discussions
- Authors: Mahathir Mohammad Bishal, Md. Rakibul Hassan Chowdory, Anik Das,
Muhammad Ashad Kabir
- Abstract summary: We label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application.
We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms.
The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches.
- Score: 1.4018975578160688
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The COVID-19 pandemic has had adverse effects on both physical and mental
health. During this pandemic, numerous studies have focused on gaining insights
into health-related perspectives from social media. In this study, our primary
objective is to develop a machine learning-based web application for
automatically classifying COVID-19-related discussions on social media. To
achieve this, we label COVID-19-related Twitter data, provide benchmark
classification results, and develop a web application. We collected data using
the Twitter API and labeled a total of 6,667 tweets into five different
classes: health risks, prevention, symptoms, transmission, and treatment. We
extracted features using various feature extraction methods and applied them to
seven different traditional machine learning algorithms, including Decision
Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest
Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep
learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we
achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning.
The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing
other traditional machine learning approaches. Our study not only contributes
to the field of health-related data analysis but also provides a valuable
resource in the form of a web-based tool for efficient data classification,
which can aid in addressing public health challenges and increasing awareness
during pandemics. We made the dataset and application publicly available, which
can be downloaded from this link
https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.
Related papers
- Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning
Framework in Classification of Medical Images on Limited Data: A COVID-19
Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources.
CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Checkovid: A COVID-19 misinformation detection system on Twitter using
network and content mining perspectives [9.69596041242667]
During the COVID-19 pandemic, social media platforms were ideal for communicating due to social isolation and quarantine.
To tackle this problem, we present two COVID-19 related misinformation datasets on Twitter.
We propose a misinformation detection system comprising network-based and content-based processes based on machine learning algorithms and NLP techniques.
arXiv Detail & Related papers (2021-07-20T20:58:23Z) - Explainable Multi-class Classification of the CAMH COVID-19 Mental
Health Data [0.9137554315375922]
We present explainable multi-class classification of the Covid-19 mental health data.
In Machine Learning study, we aim to find the potential factors to influence a personal mental health during the Covid-19 pandemic.
arXiv Detail & Related papers (2021-05-27T20:08:58Z) - Designing ECG Monitoring Healthcare System with Federated Transfer
Learning and Explainable AI [4.694126527114577]
We design a new explainable artificial intelligence (XAI) based deep learning framework in a federated setting for ECG-based healthcare applications.
The proposed framework was trained and tested using the MIT-BIH Arrhythmia database.
arXiv Detail & Related papers (2021-05-26T11:59:44Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z) - Explainable Multi-class Classification of Medical Data [0.9137554315375922]
We present explainable multi-class classification of a large medical data set.
Six algorithms are used in this study: Support Vector Machine (SVM), Na"ive Bayes, Gradient Boosting, Decision Trees, Random Forest, and Logistic Regression.
Our results show that using 23 medication features in learning experiments improves Recall of five out of the six applied learning algorithms.
arXiv Detail & Related papers (2020-12-26T18:56:07Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.