Related papers: Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

URL: http://arxiv.org/abs/2503.14905v1
Date: Wed, 19 Mar 2025 05:14:44 GMT
Title: Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
Authors: Siwei Wen, Junyan Ye, Peilin Feng, Hengrui Kang, Zichen Wen, Yize Chen, Jiang Wu, Wenjun Wu, Conghui He, Weijia Li,
Abstract summary: We introduce FakeVLM, a specialized large multimodal model for both general synthetic image and DeepFake detection tasks.<n>FakeVLM excels in distinguishing real from fake images and provides clear, natural language explanations for image artifacts.<n>We present FakeClue, a comprehensive dataset containing over 100,000 images across seven categories, annotated with fine-grained artifact clues in natural language.
Score: 15.442558725312976
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of Artificial Intelligence Generated Content (AIGC) technologies, synthetic images have become increasingly prevalent in everyday life, posing new challenges for authenticity assessment and detection. Despite the effectiveness of existing methods in evaluating image authenticity and locating forgeries, these approaches often lack human interpretability and do not fully address the growing complexity of synthetic data. To tackle these challenges, we introduce FakeVLM, a specialized large multimodal model designed for both general synthetic image and DeepFake detection tasks. FakeVLM not only excels in distinguishing real from fake images but also provides clear, natural language explanations for image artifacts, enhancing interpretability. Additionally, we present FakeClue, a comprehensive dataset containing over 100,000 images across seven categories, annotated with fine-grained artifact clues in natural language. FakeVLM demonstrates performance comparable to expert models while eliminating the need for additional classifiers, making it a robust solution for synthetic data detection. Extensive evaluations across multiple datasets confirm the superiority of FakeVLM in both authenticity classification and artifact explanation tasks, setting a new benchmark for synthetic image detection. The dataset and code will be released in: https://github.com/opendatalab/FakeVLM.

Related papers

FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics. FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights. FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z)
TruthLens:A Training-Free Paradigm for DeepFake Detection [4.64982780843177]
We introduce TruthLens, a training-free framework that reimagines deepfake detection as a visual question-answering (VQA) task.<n>TruthLens utilizes state-of-the-art large vision-language models (LVLMs) to observe and describe visual artifacts.<n>By adopting a multimodal approach, TruthLens seamlessly integrates visual and semantic reasoning to not only classify images as real or fake but also provide interpretable explanations.
arXiv Detail & Related papers (2025-03-19T15:41:32Z)
LEGION: Learning to Ground and Explain for Synthetic Image Detection [49.958951540410816]
We introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations.<n>It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels.<n>We propose LEGION, a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation.
arXiv Detail & Related papers (2025-03-19T14:37:21Z)
Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection [7.730666100347136]
We show the effectiveness of synthetic data-driven representations for synthetic image detection. We find that vision transformers trained by the latest visual representation learners with synthetic data can effectively distinguish fake from real images.
arXiv Detail & Related papers (2024-09-13T14:50:14Z)
Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels. A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks. We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z)
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning [22.4158195581231]
malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, raises significant concerns regarding the authenticity of images. We propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning and a new formulation of the detection problem. It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models.
arXiv Detail & Related papers (2023-05-23T08:13:27Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes. We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection. We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z)
Identifying Invariant Texture Violation for Robust Deepfake Detection [17.306386179823576]
We propose the Invariant Texture Learning framework, which only accesses the published dataset with low visual quality. Our method is based on the prior that the microscopic facial texture of the source face is inevitably violated by the texture transferred from the target person.
arXiv Detail & Related papers (2020-12-19T03:02:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.