Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
- URL: http://arxiv.org/abs/2506.00874v1
- Date: Sun, 01 Jun 2025 07:20:45 GMT
- Title: Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
- Authors: Yue Zhou, Xinan He, KaiQing Lin, Bin Fan, Feng Ding, Bin Li,
- Abstract summary: Current AIGC detectors often achieve near-perfect accuracy on images produced by the same generator used for training but struggle to generalize to outputs from unseen generators.<n>We trace this failure in part to latent prior bias: detectors learn shortcuts tied to patterns stemming from the initial noise vector rather than learning robust generative artifacts.<n>We propose On-Manifold Adversarial Training (OMAT), which generates adversarial examples that remain on the generator's output manifold.
- Score: 11.907536189598577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current AIGC detectors often achieve near-perfect accuracy on images produced by the same generator used for training but struggle to generalize to outputs from unseen generators. We trace this failure in part to latent prior bias: detectors learn shortcuts tied to patterns stemming from the initial noise vector rather than learning robust generative artifacts. To address this, we propose On-Manifold Adversarial Training (OMAT): by optimizing the initial latent noise of diffusion models under fixed conditioning, we generate on-manifold adversarial examples that remain on the generator's output manifold-unlike pixel-space attacks, which introduce off-manifold perturbations that the generator itself cannot reproduce and that can obscure the true discriminative artifacts. To test against state-of-the-art generative models, we introduce GenImage++, a test-only benchmark of outputs from advanced generators (Flux.1, SD3) with extended prompts and diverse styles. We apply our adversarial-training paradigm to ResNet50 and CLIP baselines and evaluate across existing AIGC forensic benchmarks and recent challenge datasets. Extensive experiments show that adversarially trained detectors significantly improve cross-generator performance without any network redesign. Our findings on latent-prior bias offer valuable insights for future dataset construction and detector evaluation, guiding the development of more robust and generalizable AIGC forensic methodologies.
Related papers
- Bi-Level Optimization for Self-Supervised AI-Generated Face Detection [56.57881725223548]
We introduce a self-supervised method for AI-generated face detectors based on bi-level optimization.<n>Our detectors significantly outperform existing approaches in both one-class and binary classification settings.
arXiv Detail & Related papers (2025-07-30T16:38:29Z) - LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection [11.700935740718675]
LATTE - Latent Trajectory Embedding - is a novel approach that models the evolution of latent embeddings across several denoising timesteps.<n>By modeling the trajectory of such embeddings rather than single-step errors, LATTE captures subtle, discriminative patterns that distinguish real from generated images.
arXiv Detail & Related papers (2025-07-03T12:53:47Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion [18.829659846356765]
We propose a new synthetic image detector that uses features obtained by inverting an open-source pre-trained Stable Diffusion model.
We show that these inversion features enable our detector to generalize well to unseen generators of high visual fidelity.
We introduce a new challenging evaluation protocol that uses reverse image search to mitigate stylistic and thematic biases in the detector evaluation.
arXiv Detail & Related papers (2024-06-12T19:14:58Z) - D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy [29.919663502808575]
Existing literature emphasizes the generalization capability of deepfake detection on unseen generators.<n>This work seeks a step toward a universal deepfake detection system with better generalization and robustness.
arXiv Detail & Related papers (2024-04-06T10:45:02Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - Securing Deep Generative Models with Universal Adversarial Signature [69.51685424016055]
Deep generative models pose threats to society due to their potential misuse.
In this paper, we propose to inject a universal adversarial signature into an arbitrary pre-trained generative model.
The proposed method is validated on the FFHQ and ImageNet datasets with various state-of-the-art generative models.
arXiv Detail & Related papers (2023-05-25T17:59:01Z) - FrePGAN: Robust Deepfake Detection Using Frequency-level Perturbations [12.027711542565315]
We design a framework to generalize the deepfake detector for both the known and unseen GAN models.
Our framework generates the frequency-level perturbation maps to make the generated images indistinguishable from the real images.
For experiments, we design new test scenarios varying from the training settings in GAN models, color manipulations, and object categories.
arXiv Detail & Related papers (2022-02-07T16:45:11Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - Old is Gold: Redefining the Adversarially Learned One-Class Classifier
Training Paradigm [15.898383112569237]
A popular method for anomaly detection is to use the generator of an adversarial network to formulate anomaly scores.
We propose a framework that effectively generates stable results across a wide range of training steps.
Our model achieves a frame-level AUC of 98.1%, surpassing recent state-of-the-art methods.
arXiv Detail & Related papers (2020-04-16T13:48:58Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.