MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection
- URL: http://arxiv.org/abs/2210.05401v2
- Date: Thu, 11 Jul 2024 15:13:14 GMT
- Title: MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection
- Authors: Cagri Toraman, Oguzhan Ozcelik, Furkan Şahinuç, Fazli Can,
- Abstract summary: We construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels.
The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes.
- Score: 4.799822253865053
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.
Related papers
- CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - A New Task and Dataset on Detecting Attacks on Human Rights Defenders [68.45906430323156]
We propose a new dataset for detecting Attacks on Human Rights Defenders (HRDsAttack) consisting of crowdsourced annotations on 500 online news articles.
The annotations include fine-grained information about the type and location of the attacks, as well as information about the victim(s)
We demonstrate the usefulness of the dataset by using it to train and evaluate baseline models on several sub-tasks to predict the annotated characteristics.
arXiv Detail & Related papers (2023-06-30T14:20:06Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a
Study on Indonesian Tweets [0.15229257192293202]
Research on COVID-19 misinformation detection in Indonesia is still scarce.
In this study, we propose the two-stage classifier model using IndoBERT pre-trained language model for the Tweet misinformation detection task.
The experimental results show that the combination of the BERT sequence classifier for relevance prediction and Bi-LSTM for misinformation detection outperformed other machine learning models with an accuracy of 87.02%.
arXiv Detail & Related papers (2022-06-30T15:33:20Z) - Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - A Weibo Dataset for the 2022 Russo-Ukrainian Crisis [59.258530429699924]
We present the Russia-Ukraine Crisis Weibo dataset, with over 3.5M user posts and comments in the first release.
Our data is available at https://github.com/yrf1/Russia-Ukraine_weibo_dataset.
arXiv Detail & Related papers (2022-03-09T19:06:04Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity,
Geo, and Gender Labels [5.267993069044648]
This work presents TBCOV, a large-scale Twitter dataset comprising more than two billion multilingual tweets related to the COVID-19 pandemic collected worldwide over a continuous period of more than one year.
Several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities, mentions of persons, organizations, locations, user types, and gender information.
Our sentiment and trend analyses reveal interesting insights and confirm TBCOV's broad coverage of important topics.
arXiv Detail & Related papers (2021-10-04T06:17:12Z) - BLM-17m: A Large-Scale Dataset for Black Lives Matter Topic Detection on
Twitter [25.881740515679393]
We propose a labeled dataset for topic detection that contains 17 million tweets.
These Tweets are collected from 25 May 2020 to 21 August 2020 that covers 89 days from start of this incident.
arXiv Detail & Related papers (2021-05-04T07:27:42Z) - Predicting Misinformation and Engagement in COVID-19 Twitter Discourse
in the First Months of the Outbreak [1.2059055685264957]
We examine nearly 505K COVID-19-related tweets from the initial months of the pandemic to understand misinformation as a function of bot-behavior and engagement.
We found that real users tweet both facts and misinformation, while bots tweet proportionally more misinformation.
arXiv Detail & Related papers (2020-12-03T18:47:34Z) - ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection [6.688963029270579]
ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020.
We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims.
Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic.
arXiv Detail & Related papers (2020-10-17T11:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.