MiRAGeNews: Multimodal Realistic AI-Generated News Detection
- URL: http://arxiv.org/abs/2410.09045v1
- Date: Fri, 11 Oct 2024 17:58:02 GMT
- Title: MiRAGeNews: Multimodal Realistic AI-Generated News Detection
- Authors: Runsheng Huang, Liam Dugan, Yue Yang, Chris Callison-Burch,
- Abstract summary: We propose the MiRAGeNews dataset to combat the spread of AI-generated fake news.
Our dataset poses a significant challenge to humans.
We train a multi-modal detector that improves by +5.1% F-1 over state-of-the-art baselines.
- Score: 45.067211436589126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.
Related papers
- Could AI Trace and Explain the Origins of AI-Generated Images and Text? [53.11173194293537]
AI-generated content is increasingly prevalent in the real world.
adversaries might exploit large multimodal models to create images that violate ethical or legal standards.
Paper reviewers may misuse large language models to generate reviews without genuine intellectual effort.
arXiv Detail & Related papers (2025-04-05T20:51:54Z) - CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization.
We propose a novel framework, Co-Spy, that first enhances existing semantic features.
We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z) - D-Judge: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance [19.760989919485894]
We introduce an AI-Natural Image Discrepancy accessing benchmark (textitD-Judge)
We construct textitD-ANI, a dataset with 5,000 natural images and over 440,000 AIGIs generated by nine models using Text-to-Image (T2I), Image-to-Image (I2I), and Text and Image-to-Image (TI2I) prompts.
Our framework evaluates the discrepancy across five dimensions: naive image quality, semantic alignment, aesthetic appeal, downstream applicability, and human validation.
arXiv Detail & Related papers (2024-12-23T15:08:08Z) - Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors [62.63467652611788]
We introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images.
Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness.
Our findings suggest that state-of-the-art detectors exhibit varying sensitivities to the types and degrees of perturbations, data distributions, and augmentation methods used.
arXiv Detail & Related papers (2024-11-12T01:17:27Z) - Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images.
Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images.
ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z) - Development of a Dual-Input Neural Model for Detecting AI-Generated Imagery [0.0]
It is important to develop tools that are able to detect AI-generated images.
This paper proposes a dual-branch neural network architecture that takes both images and their Fourier frequency decomposition as inputs.
Our proposed model achieves an accuracy of 94% on the CIFAKE dataset, which significantly outperforms classic ML methods and CNNs.
arXiv Detail & Related papers (2024-06-19T16:42:04Z) - The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking [47.08666835021915]
We present a systematic attempt at understanding and detecting AI-generated images (AI-art) in adversarial scenarios.
The dataset, named ARIA, contains over 140K images in five categories: artworks (painting), social media images, news photos, disaster scenes, and anime pictures.
arXiv Detail & Related papers (2024-04-22T21:00:13Z) - AI-Generated Faces in the Real World: A Large-Scale Case Study of Twitter Profile Images [26.891299948581782]
We conduct the first large-scale investigation of the prevalence of AI-generated profile pictures on Twitter.
Our analysis of nearly 15 million Twitter profile pictures shows that 0.052% were artificially generated, confirming their notable presence on the platform.
The results also reveal several motives, including spamming and political amplification campaigns.
arXiv Detail & Related papers (2024-04-22T14:57:17Z) - Raidar: geneRative AI Detection viA Rewriting [42.477151044325595]
Large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting.
We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output.
Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
arXiv Detail & Related papers (2024-01-23T18:57:53Z) - TWIGMA: A dataset of AI-Generated Images with Metadata From Twitter [29.77283532841167]
We introduce TWIGMA, a dataset encompassing over 800,000 gen-AI images collected from Jan 2021 to March 2023 on Twitter.
We find that gen-AI images possess distinctive characteristics and exhibit, on average, lower variability when compared to their non-gen-AI counterparts.
We observe a longitudinal shift in the themes of AI-generated images on Twitter, with users increasingly sharing artistically sophisticated content.
arXiv Detail & Related papers (2023-06-14T07:27:57Z) - DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection [57.51313366337142]
There has been growing concern over the use of generative AI for malicious purposes.
In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery and data poisoning.
We introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection.
arXiv Detail & Related papers (2023-06-02T05:11:27Z) - Seeing is not always believing: Benchmarking Human and Model Perception
of AI-Generated Images [66.20578637253831]
There is a growing concern that the advancement of artificial intelligence (AI) technology may produce fake photos.
This study aims to comprehensively evaluate agents for distinguishing state-of-the-art AI-generated visual content.
arXiv Detail & Related papers (2023-04-25T17:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.