Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning
- URL: http://arxiv.org/abs/2508.01603v1
- Date: Sun, 03 Aug 2025 05:41:24 GMT
- Title: Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning
- Authors: Yiheng Li, Zichang Tan, Zhen Lei, Xu Zhou, Yang Yang,
- Abstract summary: We propose a novel framework named Image-Adaptive Prompt Learning (IAPL), which enhances flexibility in processing diverse testing images.<n>It consists of two adaptive modules, i.e., the Conditional Information Learner and the Confidence-Driven Adaptive Prediction.<n>Experiments show that IAPL achieves state-of-the-art performance, with 95.61% and 96.7% mean accuracy on two widely used UniversalFakeDetect and GenImage datasets.
- Score: 30.415427474641813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major struggle for AI-generated image detection is identifying fake images from unseen generators. Existing cutting-edge methods typically customize pre-trained foundation models to this task via partial-parameter fine-tuning. However, these parameters trained on a narrow range of generators may fail to generalize to unknown sources. In light of this, we propose a novel framework named Image-Adaptive Prompt Learning (IAPL), which enhances flexibility in processing diverse testing images. It consists of two adaptive modules, i.e., the Conditional Information Learner and the Confidence-Driven Adaptive Prediction. The former employs CNN-based feature extractors to learn forgery-specific and image-specific conditions, which are then propagated to learnable tokens via a gated mechanism. The latter optimizes the shallowest learnable tokens based on a single test sample and selects the cropped view with the highest prediction confidence for final detection. These two modules enable the prompts fed into the foundation model to be automatically adjusted based on the input image, rather than being fixed after training, thereby enhancing the model's adaptability to various forged images. Extensive experiments show that IAPL achieves state-of-the-art performance, with 95.61% and 96.7% mean accuracy on two widely used UniversalFakeDetect and GenImage datasets, respectively.
Related papers
- RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification [14.448350657613368]
RAVID is the first framework for AI-generated image detection that leverages visual retrieval-augmented generation (RAG)<n>Our approach utilizes a fine-tuned CLIP image encoder, RAVID CLIP, enhanced with category-related prompts to improve representation learning.<n> RAVID achieves an average accuracy of 80.27% under degradation conditions, compared to 63.44% for the state-of-the-art model C2P-CLIP.
arXiv Detail & Related papers (2025-08-05T23:10:56Z) - Bi-Level Optimization for Self-Supervised AI-Generated Face Detection [56.57881725223548]
We introduce a self-supervised method for AI-generated face detectors based on bi-level optimization.<n>Our detectors significantly outperform existing approaches in both one-class and binary classification settings.
arXiv Detail & Related papers (2025-07-30T16:38:29Z) - FakeReasoning: Towards Generalizable Forgery Detection and Reasoning [24.8865218866598]
We propose modeling AI-generated image detection and explanation as a Forgery Detection and Reasoning task (FDR-Task)<n>We introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 100K images across 10 generative models.<n>We also propose FakeReasoning, a forgery detection and reasoning framework with two key components.
arXiv Detail & Related papers (2025-03-27T06:54:06Z) - Origin Identification for Text-Guided Image-to-Image Diffusion Models [39.234894330025114]
We propose origin IDentification for text-guided Image-to-image Diffusion models (ID$2$)<n>A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.<n>To solve this challenge of the proposed ID$2$ task, we contribute the first dataset and a theoretically guaranteed method.
arXiv Detail & Related papers (2025-01-04T20:34:53Z) - Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation [67.37146712877794]
IT3A is a novel test-time adaptation method that utilizes a pre-trained generative model for multi-modal augmentation of each test sample from unknown new domains.<n>By combining augmented data from pre-trained vision and language models, we enhance the ability of the model to adapt to unknown new test data.<n>In a zero-shot setting, IT3A outperforms state-of-the-art test-time prompt tuning methods with a 5.50% increase in accuracy.
arXiv Detail & Related papers (2024-12-12T20:01:24Z) - Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.<n>We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
arXiv Detail & Related papers (2024-09-16T15:10:07Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Reinforcement Learning from Diffusion Feedback: Q* for Image Search [2.5835347022640254]
We present two models for image generation using model-agnostic learning.
RLDF is a singular approach for visual imitation through prior-preserving reward function guidance.
It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
arXiv Detail & Related papers (2023-11-27T09:20:12Z) - Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio.
We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets.
We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.