AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild
- URL: http://arxiv.org/abs/2405.11697v2
- Date: Tue, 21 May 2024 15:35:26 GMT
- Title: AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild
- Authors: Nicholas Dufour, Arkanath Pathak, Pouya Samangouei, Nikki Hariri, Shashi Deshetti, Andrew Dudfield, Christopher Guess, Pablo Hernández Escayola, Bobby Tran, Mevan Babakar, Christoph Bregler,
- Abstract summary: We show the results of a two-year study using human raters to annotate online media-based misinformation.
We show the rise of generative AI-based content in misinformation claims.
We also show "simple" methods dominated historically, particularly context manipulations.
- Score: 1.4193873432298625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.
Related papers
- Characteristics of Political Misinformation Over the Past Decade [0.0]
This paper uses natural language processing to find common characteristics of political misinformation over a twelve year period.
The results show that misinformation has increased dramatically in recent years and that it has increasingly started to be shared from sources with primary information modalities of text and images.
It was discovered that statements expressing misinformation contain more negative sentiment than accurate information.
arXiv Detail & Related papers (2024-11-09T09:12:39Z) - Countering Misinformation via Emotional Response Generation [15.383062216223971]
proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and democracy.
Previous research has shown how social correction can be an effective way to curb misinformation.
We present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs.
arXiv Detail & Related papers (2023-11-17T15:37:18Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - Fighting Malicious Media Data: A Survey on Tampering Detection and
Deepfake Detection [115.83992775004043]
Recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost.
This paper provides a comprehensive review of the current media tampering detection approaches, and discusses the challenges and trends in this field for future research.
arXiv Detail & Related papers (2022-12-12T02:54:08Z) - GREENER: Graph Neural Networks for News Media Profiling [24.675574340841163]
We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias.
Our main focus is on modeling the similarity between media outlets based on the overlap of their audience.
Prediction accuracy is found to improve by 2.5-27 macro-F1 points for the two tasks.
arXiv Detail & Related papers (2022-11-10T12:46:29Z) - Adherence to Misinformation on Social Media Through Socio-Cognitive and
Group-Based Processes [79.79659145328856]
We argue that when misinformation proliferates, this happens because the social media environment enables adherence to misinformation.
We make the case that polarization and misinformation adherence are closely tied.
arXiv Detail & Related papers (2022-06-30T12:34:24Z) - Misinformation Detection in Social Media Video Posts [0.4724825031148411]
Short-form video by social media platforms has become a critical challenge for social media providers.
We develop methods to detect misinformation in social media posts, exploiting modalities such as video and text.
We collect 160,000 video posts from Twitter, and leverage self-supervised learning to learn expressive representations of joint visual and textual data.
arXiv Detail & Related papers (2022-02-15T20:14:54Z) - Leveraging Multi-Source Weak Social Supervision for Early Detection of
Fake News [67.53424807783414]
Social media has greatly enabled people to participate in online activities at an unprecedented rate.
This unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation.
We jointly leverage the limited amount of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances.
Experiments on realworld datasets demonstrate that the proposed framework outperforms state-of-the-art baselines for early detection of fake news without using any user engagements at prediction time.
arXiv Detail & Related papers (2020-04-03T18:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.