COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions
Attributes
- URL: http://arxiv.org/abs/2007.06954v8
- Date: Sat, 25 Jun 2022 06:35:40 GMT
- Title: COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions
Attributes
- Authors: Raj Kumar Gupta, Ajay Vishwanath, Yinping Yang
- Abstract summary: This paper describes a large global dataset on people's discourse and responses to the COVID-19 pandemic over the Twitter platform.
We collected and processed over 252 million Twitter posts from more than 29 million unique users using four keywords: "corona", "wuhan", "nCov" and "covid"
The paper concludes with a discussion of the dataset's usage in communication, psychology, public health, economics, and epidemiology.
- Score: 4.254099382808598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes a large global dataset on people's discourse and
responses to the COVID-19 pandemic over the Twitter platform. From 28 January
2020 to 1 June 2022, we collected and processed over 252 million Twitter posts
from more than 29 million unique users using four keywords: "corona", "wuhan",
"nCov" and "covid". Leveraging probabilistic topic modelling and pre-trained
machine learning-based emotion recognition algorithms, we labelled each tweet
with seventeen attributes, including a) ten binary attributes indicating the
tweet's relevance (1) or irrelevance (0) to the top ten detected topics, b)
five quantitative emotion attributes indicating the degree of intensity of the
valence or sentiment (from 0: extremely negative to 1: extremely positive) and
the degree of intensity of fear, anger, sadness and happiness emotions (from 0:
not at all to 1: extremely intense), and c) two categorical attributes
indicating the sentiment (very negative, negative, neutral or mixed, positive,
very positive) and the dominant emotion (fear, anger, sadness, happiness, no
specific emotion) the tweet is mainly expressing. We discuss the technical
validity and report the descriptive statistics of these attributes, their
temporal distribution, and geographic representation. The paper concludes with
a discussion of the dataset's usage in communication, psychology, public
health, economics, and epidemiology.
Related papers
- Lexicon-Based Sentiment Analysis on Text Polarities with Evaluation of Classification Models [1.342834401139078]
This work uses a lexicon-based method to perform sentiment analysis and shows an evaluation of classification models trained over textual data.
The lexicon-based methods identify the intensity of emotion and subjectivity at word levels.
This work is based on a multi-class problem of text being labeled as positive, negative, or neutral.
arXiv Detail & Related papers (2024-09-19T15:31:12Z) - EMO-KNOW: A Large Scale Dataset on Emotion and Emotion-cause [8.616061735005314]
We introduce a large-scale dataset of emotion causes, derived from 9.8 million cleaned tweets over 15 years.
The novelty of our dataset stems from its broad spectrum of emotion classes and the abstractive emotion cause.
Our dataset will enable the design of emotion-aware systems that account for the diverse emotional responses of different people.
arXiv Detail & Related papers (2024-06-18T08:26:33Z) - DepressionEmo: A novel dataset for multilabel classification of
depression emotions [6.26397257917403]
DepressionEmo is a dataset designed to detect 8 emotions associated with depression by 6037 examples of long Reddit user posts.
This dataset was created through a majority vote over inputs by zero-shot classifications from pre-trained models.
We provide several text classification methods classified into two groups: machine learning methods such as SVM, XGBoost, and Light GBM; and deep learning methods such as BERT, GAN-BERT, and BART.
arXiv Detail & Related papers (2024-01-09T16:25:31Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Exploring a Hybrid Deep Learning Framework to Automatically Discover
Topic and Sentiment in COVID-19 Tweets [2.3940819037450987]
COVID-19 has created a major public health problem worldwide and other problems such as economic crisis, unemployment, mental distress, etc.
The pandemic is deadly in the world and involves many people not only with infection but also with problems, stress, wonder, fear, resentment, and hatred.
Twitter is a highly influential social media platform and a significant source of health-related information, news, opinion and public sentiment.
arXiv Detail & Related papers (2023-12-02T16:58:17Z) - EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes [53.95428298229396]
We introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes.
EmoSet comprises 3.3 million images in total, with 118,102 of these images carefully labeled by human annotators.
Motivated by psychological studies, in addition to emotion category, each image is also annotated with a set of describable emotion attributes.
arXiv Detail & Related papers (2023-07-16T06:42:46Z) - ArPanEmo: An Open-Source Dataset for Fine-Grained Emotion Recognition in
Arabic Online Content during COVID-19 Pandemic [0.0]
This paper presents the ArPanEmo dataset, a novel dataset for fine-grained emotion recognition of online posts in Arabic.
The dataset comprises 11,128 online posts manually labeled for ten emotion categories or neutral, with Fleiss' kappa of 0.71.
It targets a specific Arabic dialect and addresses topics related to the COVID-19 pandemic, making it the first and largest of its kind.
arXiv Detail & Related papers (2023-05-27T21:04:26Z) - Why Do You Feel This Way? Summarizing Triggers of Emotions in Social
Media Posts [61.723046082145416]
We introduce CovidET (Emotions and their Triggers during Covid-19), a dataset of 1,900 English Reddit posts related to COVID-19.
We develop strong baselines to jointly detect emotions and summarize emotion triggers.
Our analyses show that CovidET presents new challenges in emotion-specific summarization, as well as multi-emotion detection in long social media posts.
arXiv Detail & Related papers (2022-10-22T19:10:26Z) - Affection: Learning Affective Explanations for Real-World Visual Data [50.28825017427716]
We introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85,007 publicly available images.
We show that there is significant common ground to capture potentially plausible emotional responses with a large support in the subject population.
Our work paves the way for richer, more human-centric, and emotionally-aware image analysis systems.
arXiv Detail & Related papers (2022-10-04T22:44:17Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - Detecting Perceived Emotions in Hurricane Disasters [62.760131661847986]
We introduce HurricaneEmo, an emotion dataset of 15,000 English tweets spanning three hurricanes: Harvey, Irma, and Maria.
We present a comprehensive study of fine-grained emotions and propose classification tasks to discriminate between coarse-grained emotion groups.
arXiv Detail & Related papers (2020-04-29T16:17:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.