GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19
Tweets with Location Information
- URL: http://arxiv.org/abs/2005.11177v1
- Date: Fri, 22 May 2020 13:30:42 GMT
- Title: GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19
Tweets with Location Information
- Authors: Umair Qazi, Muhammad Imran, Ferda Ofli
- Abstract summary: We present GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020.
We postulate that this large-scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis.
- Score: 4.541389211258011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The past several years have witnessed a huge surge in the use of social media
platforms during mass convergence events such as health emergencies, natural or
human-induced disasters. These non-traditional data sources are becoming vital
for disease forecasts and surveillance when preparing for epidemic and pandemic
outbreaks. In this paper, we present GeoCoV19, a large-scale Twitter dataset
containing more than 524 million multilingual tweets posted over a period of 90
days since February 1, 2020. Moreover, we employ a gazetteer-based approach to
infer the geolocation of tweets. We postulate that this large-scale,
multilingual, geolocated social media data can empower the research communities
to evaluate how societies are collectively coping with this unprecedented
global crisis as well as to develop computational methods to address challenges
such as identifying fake news, understanding communities' knowledge gaps,
building disease forecast and surveillance models, among others.
Related papers
- SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness [73.73883111570458]
We introduce the first multilingual Event Extraction framework for extracting epidemic event information for a wide range of diseases and languages.
Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models.
Our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 from Chinese Weibo posts without any training in Chinese.
arXiv Detail & Related papers (2024-10-24T03:03:54Z) - Event Detection from Social Media for Epidemic Prediction [76.90779562626541]
We develop a framework to extract and analyze epidemic-related events from social media posts.
Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics.
We show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox.
arXiv Detail & Related papers (2024-04-02T06:31:17Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Twitter conversations predict the daily confirmed COVID-19 cases [0.2320417845168326]
Pandemic-specific discourse has remained on-trend on microblogging platforms such as Twitter and Weibo.
We propose a sentiment-involved topic-based methodology for designing multiple time series from publicly available COVID-19 related Twitter conversations.
We show that the inclusion of social media variables for modeling introduces 48.83--51.38% improvements on RMSE over the baseline models.
arXiv Detail & Related papers (2022-06-21T15:31:06Z) - TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity,
Geo, and Gender Labels [5.267993069044648]
This work presents TBCOV, a large-scale Twitter dataset comprising more than two billion multilingual tweets related to the COVID-19 pandemic collected worldwide over a continuous period of more than one year.
Several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities, mentions of persons, organizations, locations, user types, and gender information.
Our sentiment and trend analyses reveal interesting insights and confirm TBCOV's broad coverage of important topics.
arXiv Detail & Related papers (2021-10-04T06:17:12Z) - Changes in European Solidarity Before and During COVID-19: Evidence from
a Large Crowd- and Expert-Annotated Twitter Dataset [77.27709662210363]
We introduce the well-established social scientific concept of social solidarity and its contestation, anti-solidarity, as a new problem setting to supervised machine learning in NLP.
We annotate 2.3k English and German tweets for (anti-)solidarity expressions, utilizing multiple human annotators and two annotation approaches (experts vs. crowds)
Our results show that solidarity became increasingly salient and contested during the COVID-19 crisis.
arXiv Detail & Related papers (2021-08-02T17:03:12Z) - COVID-19 and Big Data: Multi-faceted Analysis for Spatio-temporal
Understanding of the Pandemic with Social Media Conversations [4.07452542897703]
Social media platforms have served as a vehicle for the global conversation about COVID-19.
We present a framework for analysis, mining, and tracking the critical content and characteristics of social media conversations around the pandemic.
arXiv Detail & Related papers (2021-04-22T00:45:50Z) - Cross-lingual Transfer Learning for COVID-19 Outbreak Alignment [90.12602012910465]
We train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries.
Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions.
arXiv Detail & Related papers (2020-06-05T02:04:25Z) - Critical Impact of Social Networks Infodemic on Defeating Coronavirus
COVID-19 Pandemic: Twitter-Based Study and Research Directions [1.6571886312953874]
An estimated 2.95 billion people in 2019 used social media worldwide.
The widespread of the Coronavirus COVID-19 resulted with a tsunami of social media.
This paper presents a large-scale study based on data mined from Twitter.
arXiv Detail & Related papers (2020-05-18T15:53:13Z) - COVI White Paper [67.04578448931741]
Contact tracing is an essential tool to change the course of the Covid-19 pandemic.
We present an overview of the rationale, design, ethical considerations and privacy strategy of COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
arXiv Detail & Related papers (2020-05-18T07:40:49Z) - Large Arabic Twitter Dataset on COVID-19 [0.7734726150561088]
coronavirus disease (COVID-19), emerged late December 2019 in China, is now rapidly spreading across the globe.
The number of global confirmed cases has passed two millions and half with over 180,000 fatalities.
This work describes the first Arabic tweets dataset on COVID-19 that we have been collecting since January 1st, 2020.
arXiv Detail & Related papers (2020-04-09T01:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.