Related papers: CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines

CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines

URL: http://arxiv.org/abs/2204.13746v2
Date: Fri, 11 Nov 2022 14:16:46 GMT
Title: CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines
Authors: Soham Poddar, Azlaan Mustafa Samad, Rajdeep Mukherjee, Niloy Ganguly, Saptarshi Ghosh
Abstract summary: We have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns. This is also the first multi-label classification dataset that provides explanations for each of the labels.
Score: 18.617543658780367
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data

Related papers

Labeled Datasets for Research on Information Operations [71.34999856621306]
We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data) The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
arXiv Detail & Related papers (2024-11-15T22:15:01Z)
Analyzing COVID-19 Vaccination Sentiments in Nigerian Cyberspace: Insights from a Manually Annotated Twitter Dataset [2.820717448579396]
We explore the use of transformer-based language models to study people's acceptance of vaccines in Nigeria. We developed a novel dataset by crawling multi-lingual tweets using relevant hashtags and keywords. Our analysis and visualizations revealed that most tweets expressed neutral sentiments about COVID-19 vaccines, with some individuals expressing positive views.
arXiv Detail & Related papers (2024-01-23T22:49:19Z)
Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively. We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z)
Vax-Culture: A Dataset for Studying Vaccine Discourse on Twitter [3.768191396638854]
Vaccine hesitancy continues to be a main challenge for public health officials during the COVID-19 pandemic. We present Vax-Culture, a novel Twitter COVID-19 dataset consisting of 6373 vaccine-related tweets. We hope this can lead to effective and targeted public health communication strategies for reaching individuals with anti-vaccine beliefs.
arXiv Detail & Related papers (2023-04-13T23:04:30Z)
Doctors vs. Nurses: Understanding the Great Divide in Vaccine Hesitancy among Healthcare Workers [64.1526243118151]
We find that doctors are overall more positive toward the COVID-19 vaccines. Doctors are more concerned with the effectiveness of the vaccines over newer variants. Nurses pay more attention to the potential side effects on children.
arXiv Detail & Related papers (2022-09-11T14:22:16Z)
CoVaxNet: An Online-Offline Data Repository for COVID-19 Vaccine Hesitancy Research [39.82073461647643]
A substantial proportion of the population is still hesitant to be vaccinated against the COVID-19 virus. Existing datasets fail to cover all these aspects, making it difficult to form a complete picture in inferencing about the problem of vaccine hesitancy. In this paper, we construct a multi-source, multi-modal, and multi-feature online-offline data repository CoVaxNet.
arXiv Detail & Related papers (2022-06-30T05:58:35Z)
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media [40.61499595293957]
We propose a novel semi-supervised approach for vaccine attitude detection, called VADet. VADet is able to learn disentangled stance and aspect topics, and outperforms existing aspect-based sentiment analysis models on both stance detection and tweet clustering.
arXiv Detail & Related papers (2022-05-06T15:24:33Z)
ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination [7.594204373985492]
We release the first largest manually annotated Arabic tweet dataset, ArCovidVac, for the COVID-19 vaccination campaign. The dataset is enriched with different layers of annotation, including, (i) Informativeness (more vs. less importance of the tweets); (ii) fine-grained tweet content types (e.g., advice, rumors, restriction, authenticate news/information); and (iii) stance towards vaccination.
arXiv Detail & Related papers (2022-01-17T16:19:21Z)
Classifying vaccine sentiment tweets by modelling domain-specific representation and commonsense knowledge into context-aware attentive GRU [9.8215089151757]
Vaccine hesitancy and refusal can create clusters of low vaccine coverage and reduce the effectiveness of vaccination programs. Social media provides an opportunity to estimate emerging risks to vaccine acceptance by including geographical location and detailing vaccine-related concerns. Methods for classifying social media posts, such as vaccine-related tweets, use language models (LMs) trained on general domain text. We present a novel end-to-end framework consisting of interconnected components that use domain-specific LM trained on vaccine-related tweets and models commonsense knowledge into a bidirectional gated recurrent network (CK-BiGRU) with context-aware attention.
arXiv Detail & Related papers (2021-06-17T15:16:08Z)
COVID-19 Vaccine Hesitancy on Social Media: Building a Public Twitter Dataset of Anti-vaccine Content, Vaccine Misinformation and Conspiracies [10.505633521103018]
False claims about COVID-19 vaccines can undermine public trust in ongoing vaccination campaigns. We present a dataset of Twitter posts that exhibit a strong anti-vaccine stance.
arXiv Detail & Related papers (2021-05-11T15:43:41Z)
Falling into the Echo Chamber: the Italian Vaccination Debate on Twitter [65.7192861893042]
We examine the extent to which the vaccination debate on Twitter is conductive to potential outreach to the vaccination hesitant. We discover that the vaccination skeptics, as well as the advocates, reside in their own distinct "echo chambers" At the center of these echo chambers we find the ardent supporters, for which we build highly accurate network- and content-based classifiers.
arXiv Detail & Related papers (2020-03-26T13:55:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.