TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity,
Geo, and Gender Labels
- URL: http://arxiv.org/abs/2110.03664v1
- Date: Mon, 4 Oct 2021 06:17:12 GMT
- Title: TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity,
Geo, and Gender Labels
- Authors: Muhammad Imran, Umair Qazi, Ferda Ofli
- Abstract summary: This work presents TBCOV, a large-scale Twitter dataset comprising more than two billion multilingual tweets related to the COVID-19 pandemic collected worldwide over a continuous period of more than one year.
Several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities, mentions of persons, organizations, locations, user types, and gender information.
Our sentiment and trend analyses reveal interesting insights and confirm TBCOV's broad coverage of important topics.
- Score: 5.267993069044648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread usage of social networks during mass convergence events, such
as health emergencies and disease outbreaks, provides instant access to
citizen-generated data that carry rich information about public opinions,
sentiments, urgent needs, and situational reports. Such information can help
authorities understand the emergent situation and react accordingly. Moreover,
social media plays a vital role in tackling misinformation and disinformation.
This work presents TBCOV, a large-scale Twitter dataset comprising more than
two billion multilingual tweets related to the COVID-19 pandemic collected
worldwide over a continuous period of more than one year. More importantly,
several state-of-the-art deep learning models are used to enrich the data with
important attributes, including sentiment labels, named-entities (e.g.,
mentions of persons, organizations, locations), user types, and gender
information. Last but not least, a geotagging method is proposed to assign
country, state, county, and city information to tweets, enabling a myriad of
data analysis tasks to understand real-world issues. Our sentiment and trend
analyses reveal interesting insights and confirm TBCOV's broad coverage of
important topics.
Related papers
- News and Misinformation Consumption in Europe: A Longitudinal
Cross-Country Perspective [49.1574468325115]
This study investigated information consumption in four European countries.
It analyzed three years of Twitter activity from news outlet accounts in France, Germany, Italy, and the UK.
Results indicate that reliable sources dominate the information landscape, although unreliable content is still present across all countries.
arXiv Detail & Related papers (2023-11-09T16:22:10Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - EDSA-Ensemble: an Event Detection Sentiment Analysis Ensemble
Architecture [63.85863519876587]
Using Sentiment Analysis to understand the polarity of each message belonging to an event, as well as the entire event, can help to better understand the general and individual feelings of significant trends and the dynamics on online social networks.
We propose a new ensemble architecture, EDSA-Ensemble, that uses Event Detection and Sentiment Analysis to improve the detection of the polarity for current events from Social Media.
arXiv Detail & Related papers (2023-01-30T11:56:08Z) - Extracting Feelings of People Regarding COVID-19 by Social Network
Mining [0.0]
dataset of COVID-related tweets in English language is collected.
More than two million tweets from March 23 to June 23 of 2020 are analyzed.
arXiv Detail & Related papers (2021-10-12T16:45:33Z) - COVID-19 and Big Data: Multi-faceted Analysis for Spatio-temporal
Understanding of the Pandemic with Social Media Conversations [4.07452542897703]
Social media platforms have served as a vehicle for the global conversation about COVID-19.
We present a framework for analysis, mining, and tracking the critical content and characteristics of social media conversations around the pandemic.
arXiv Detail & Related papers (2021-04-22T00:45:50Z) - I-AID: Identifying Actionable Information from Disaster-related Tweets [0.0]
Social media plays a significant role in disaster management by providing valuable data about affected people, donations and help requests.
We propose I-AID, a multimodel approach to automatically categorize tweets into multi-label information types.
Our results indicate that I-AID outperforms state-of-the-art approaches in terms of weighted average F1 score by +6% and +4% on the TREC-IS dataset and COVID-19 Tweets, respectively.
arXiv Detail & Related papers (2020-08-04T19:07:50Z) - Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective
and a Call to Arms [42.7332883578842]
With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information.
There was also a new blending of medical and political misinformation and disinformation, which gave rise to the first global infodemic.
This is a complex problem that needs a holistic approach combining the perspectives of journalists, fact-checkers, policymakers, government entities, social media platforms, and society as a whole.
arXiv Detail & Related papers (2020-07-15T21:18:30Z) - GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19
Tweets with Location Information [4.541389211258011]
We present GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020.
We postulate that this large-scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis.
arXiv Detail & Related papers (2020-05-22T13:30:42Z) - Critical Impact of Social Networks Infodemic on Defeating Coronavirus
COVID-19 Pandemic: Twitter-Based Study and Research Directions [1.6571886312953874]
An estimated 2.95 billion people in 2019 used social media worldwide.
The widespread of the Coronavirus COVID-19 resulted with a tsunami of social media.
This paper presents a large-scale study based on data mined from Twitter.
arXiv Detail & Related papers (2020-05-18T15:53:13Z) - Fighting the COVID-19 Infodemic: Modeling the Perspective of
Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the
Society [37.9389191670008]
COVID-19 has been declared one of the most important focus areas of the World Health Organization.
Fighting this infodemic has been declared one of the most important focus areas of the World Health Organization.
We release a large dataset of 16K manually annotated tweets for fine-grained disinformation analysis.
arXiv Detail & Related papers (2020-04-30T18:04:20Z) - Mining Disinformation and Fake News: Concepts, Methods, and Recent
Advancements [55.33496599723126]
disinformation including fake news has become a global phenomenon due to its explosive growth.
Despite the recent progress in detecting disinformation and fake news, it is still non-trivial due to its complexity, diversity, multi-modality, and costs of fact-checking or annotation.
arXiv Detail & Related papers (2020-01-02T21:01:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.