A Weibo Dataset for the 2022 Russo-Ukrainian Crisis
- URL: http://arxiv.org/abs/2203.05967v1
- Date: Wed, 9 Mar 2022 19:06:04 GMT
- Title: A Weibo Dataset for the 2022 Russo-Ukrainian Crisis
- Authors: Yi R. Fung and Heng Ji
- Abstract summary: We present the Russia-Ukraine Crisis Weibo dataset, with over 3.5M user posts and comments in the first release.
Our data is available at https://github.com/yrf1/Russia-Ukraine_weibo_dataset.
- Score: 59.258530429699924
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Online social networks such as Twitter and Weibo play an important role in
how people stay informed and exchange reactions. Each crisis encompasses a new
opportunity to study the portability of models for various tasks (e.g.,
information extraction, complex event understanding, misinformation detection,
etc.), due to differences in domain, entities, and event types. We present the
Russia-Ukraine Crisis Weibo (RUW) dataset, with over 3.5M user posts and
comments in the first release. Our data is available at
https://github.com/yrf1/RussiaUkraine_weibo_dataset.
Related papers
- CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and
Summarization [62.77066949111921]
This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date.
CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms.
Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks.
arXiv Detail & Related papers (2022-10-25T17:32:40Z) - MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection [4.799822253865053]
We construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels.
The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes.
arXiv Detail & Related papers (2022-10-11T12:25:26Z) - VoynaSlov: A Data Set of Russian Social Media Activity during the 2022
Ukraine-Russia War [36.18151945028956]
We describe a new data set called VoynaSlov which contains 21M+ Russian-language social media activities.
We scraped the data from two major platforms that are widely used in Russia: Twitter and VKontakte (VK), a Russian social media platform based in Saint Petersburg commonly referred to as "Russian Facebook"
The main differences that distinguish our data from previously released data related to the ongoing war are its focus on Russian media and consideration of state-affiliation.
arXiv Detail & Related papers (2022-05-24T21:59:10Z) - Identifying and Characterizing Active Citizens who Refute Misinformation
in Social Media [25.986531330843434]
We study the task across different social media platforms (i.e., Twitter and Weibo) and languages (i.e., English and Chinese) for the first time.
We develop and make publicly available a new dataset of Weibo users mapped into one of the two categories (i.e., misinformation posters or active citizens)
We present an extensive analysis of the differences in language use between the two user categories.
arXiv Detail & Related papers (2022-04-21T13:22:48Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - Twitter Dataset for 2022 Russo-Ukrainian Crisis [16.025531545463142]
We provide a Twitter dataset of the 2022 Russo-Ukrainian conflict.
In the first release, we share over 1.6 million tweets shared during the 1st week of the crisis.
arXiv Detail & Related papers (2022-03-06T12:49:40Z) - Event-Related Bias Removal for Real-time Disaster Events [67.2965372987723]
Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks.
Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time.
We train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification.
arXiv Detail & Related papers (2020-11-02T02:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.