Related papers: COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions

COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions

URL: http://arxiv.org/abs/2402.09897v1
Date: Thu, 15 Feb 2024 11:45:34 GMT
Title: COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions
Authors: Mahathir Mohammad Bishal, Md. Rakibul Hassan Chowdory, Anik Das, Muhammad Ashad Kabir
Abstract summary: We label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches.
Score: 1.4018975578160688
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.

Related papers

Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z)
When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning Framework in Classification of Medical Images on Limited Data: A COVID-19 Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources. CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z)
CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps. We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z)
Checkovid: A COVID-19 misinformation detection system on Twitter using network and content mining perspectives [9.69596041242667]
During the COVID-19 pandemic, social media platforms were ideal for communicating due to social isolation and quarantine. To tackle this problem, we present two COVID-19 related misinformation datasets on Twitter. We propose a misinformation detection system comprising network-based and content-based processes based on machine learning algorithms and NLP techniques.
arXiv Detail & Related papers (2021-07-20T20:58:23Z)
Explainable Multi-class Classification of the CAMH COVID-19 Mental Health Data [0.9137554315375922]
We present explainable multi-class classification of the Covid-19 mental health data. In Machine Learning study, we aim to find the potential factors to influence a personal mental health during the Covid-19 pandemic.
arXiv Detail & Related papers (2021-05-27T20:08:58Z)
Designing ECG Monitoring Healthcare System with Federated Transfer Learning and Explainable AI [4.694126527114577]
We design a new explainable artificial intelligence (XAI) based deep learning framework in a federated setting for ECG-based healthcare applications. The proposed framework was trained and tested using the MIT-BIH Arrhythmia database.
arXiv Detail & Related papers (2021-05-26T11:59:44Z)
FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources. Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19. The data itself is still scarce due to patient privacy concerns. We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z)
Explainable Multi-class Classification of Medical Data [0.9137554315375922]
We present explainable multi-class classification of a large medical data set. Six algorithms are used in this study: Support Vector Machine (SVM), Na"ive Bayes, Gradient Boosting, Decision Trees, Random Forest, and Logistic Regression. Our results show that using 23 medication features in learning experiments improves Recall of five out of the six applied learning algorithms.
arXiv Detail & Related papers (2020-12-26T18:56:07Z)
Classification of COVID-19 in CT Scans using Multi-Source Transfer Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans. With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet. Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
Opportunities and Challenges of Deep Learning Methods for Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare. Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals. This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.