Related papers: Generalizable Synthetic Image Detection via Language-guided Contrastive Learning

Generalizable Synthetic Image Detection via Language-guided Contrastive Learning

URL: http://arxiv.org/abs/2305.13800v2
Date: Wed, 30 Apr 2025 03:27:31 GMT
Title: Generalizable Synthetic Image Detection via Language-guided Contrastive Learning
Authors: Haiwei Wu, Jiantao Zhou, Shile Zhang,
Abstract summary: malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, raises significant concerns regarding the authenticity of images.<n>We propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning.<n>It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models.
Score: 22.533225521726116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The heightened realism of AI-generated images can be attributed to the rapid development of synthetic models, including generative adversarial networks (GANs) and diffusion models (DMs). The malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, however, raises significant concerns regarding the authenticity of images. Though many forensic algorithms have been developed for detecting synthetic images, their performance, especially the generalization capability, is still far from being adequate to cope with the increasing number of synthetic models. In this work, we propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning. Specifically, we augment the training images with carefully-designed textual labels, enabling us to use a joint visual-language contrastive supervision for learning a forensic feature space with better generalization. It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models and delivers promising performance that far exceeds state-of-the-art competitors over four datasets. The code is available at https://github.com/HighwayWu/LASTED.

Related papers

FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics. FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights. FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z)
LEGION: Learning to Ground and Explain for Synthetic Image Detection [49.958951540410816]
We introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. We propose LEGION, a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation.
arXiv Detail & Related papers (2025-03-19T14:37:21Z)
Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG) TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms. We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z)
Harnessing the Power of Large Vision Language Models for Synthetic Image Detection [14.448350657613364]
This study investigates the effectiveness of using advanced vision-language models (VLMs) for synthetic image identification. By harnessing the robust understanding capabilities of large VLMs, the aim is to distinguish authentic images from synthetic images produced by diffusion-based models.
arXiv Detail & Related papers (2024-04-03T13:27:54Z)
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs) We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods. We present a novel forgery-aware adaptive transformer approach, namely FatFormer. Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z)
Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images [35.195284384050325]
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images. We address class name ambiguity, lack of diversity in naive prompts, and domain shifts. Our framework consistently enhances recognition model performance with more synthetic data.
arXiv Detail & Related papers (2023-12-04T18:35:27Z)
Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z)
Improving Synthetically Generated Image Detection in Cross-Concept Settings [20.21594285488186]
We focus on the challenge of generalizing across different concept classes, e.g., when training a detector on human faces. We propose an approach based on the premise that the robustness of the detector can be enhanced by training it on realistic synthetic images.
arXiv Detail & Related papers (2023-04-24T12:45:00Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis [8.777277201807351]
We develop a new detection method for images that are indistinguishable from real ones. Our method can detect images from a known generative model and enable us to establish relationships between fine-tuned generative models. Our approach achieves comparable performance to state-of-the-art pre-trained detection methods on images generated by Stable Diffusion and Midversa.
arXiv Detail & Related papers (2023-03-19T20:31:38Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Identity-Aware CycleGAN for Face Photo-Sketch Synthesis and Recognition [61.87842307164351]
We first propose an Identity-Aware CycleGAN (IACycleGAN) model that applies a new perceptual loss to supervise the image generation network. It improves CycleGAN on photo-sketch synthesis by paying more attention to the synthesis of key facial regions, such as eyes and nose. We develop a mutual optimization procedure between the synthesis model and the recognition model, which iteratively synthesizes better images by IACycleGAN.
arXiv Detail & Related papers (2021-03-30T01:30:08Z)
You Only Need Adversarial Supervision for Semantic Image Synthesis [84.83711654797342]
We propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results. We show that images synthesized by our model are more diverse and follow the color and texture of real images more closely.
arXiv Detail & Related papers (2020-12-08T23:00:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.