Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning
- URL: http://arxiv.org/abs/2508.01603v2
- Date: Mon, 29 Sep 2025 09:12:33 GMT
- Title: Towards Generalizable AI-Generated Image Detection via Image-Adaptive Prompt Learning
- Authors: Yiheng Li, Zichang Tan, Zhen Lei, Xu Zhou, Yang Yang,
- Abstract summary: Image-Adaptive Prompt Learning (IAPL) is a novel paradigm that adjusts the prompts according to each input image, rather than fixing them after training.<n>IAPL achieves state-of-the-art performance, with mean accuracies of 95.61% and 96.7% on the widely used UniversalFakeDetect and GenImage datasets.
- Score: 33.269644831847636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In AI-generated image detection, current cutting-edge methods typically adapt pre-trained foundation models through partial-parameter fine-tuning. However, these approaches often struggle to generalize to forgeries from unseen generators, as the fine-tuned models capture only limited patterns from training data and fail to reflect the evolving traits of new ones. To overcome this limitation, we propose Image-Adaptive Prompt Learning (IAPL), a novel paradigm that dynamically adjusts the prompts fed into the encoder according to each input image, rather than fixing them after training. This design significantly enhances robustness and adaptability to diverse forged images. The dynamic prompts integrate conditional information with test-time adaptive tokens through a lightweight gated mechanism. The conditional information is produced by a Conditional Information Learner, which leverages CNN-based feature extractors to model both forgery-specific and image-specific conditions. The test-time adaptive tokens are optimized during inference on a single sample by enforcing prediction consistency across multiple views, ensuring that the parameters align with the current image. For the final decision, the cropped view with the highest prediction confidence is selected. Extensive experiments show that IAPL achieves state-of-the-art performance, with mean accuracies of 95.61% and 96.7% on the widely used UniversalFakeDetect and GenImage datasets, respectively. Codes and weights will be released on https://github.com/liyih/IAPL.
Related papers
- Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated [40.02006384527024]
Despite being trained on balanced datasets, existing AI-generated image detectors often exhibit systematic bias at test time.<n>We propose a theoretically grounded post-hoc calibration framework based on Bayesian decision theory.<n>Our approach significantly improves robustness without retraining, offering a lightweight and principled solution for reliable and adaptive AI-generated image detection.
arXiv Detail & Related papers (2026-02-02T11:26:37Z) - Detecting AI-Generated Images via Distributional Deviations from Real Images [6.615773227400183]
We propose a Masking-based Pre-trained model Fine-Tuning (MPFT) strategy, which introduces a Texture-Aware Masking (TAM) mechanism to mask textured areas containing generative model-specific patterns during fine-tuning.<n>Our method, fine-tuned with only a minimal number of images, significantly outperforms existing approaches, achieving up to 98.2% and 94.6% average accuracy on the two datasets, respectively.
arXiv Detail & Related papers (2026-01-07T05:00:13Z) - Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection [0.0]
We propose a hybrid detection framework that combines a fine-tuned Vision Transformer (ViT) with a novel edge-based image processing module.<n>The proposed method is highly suitable for real-world applications in automated content verification and digital forensics.
arXiv Detail & Related papers (2025-08-25T10:30:56Z) - RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification [14.448350657613368]
RAVID is the first framework for AI-generated image detection that leverages visual retrieval-augmented generation (RAG)<n>Our approach utilizes a fine-tuned CLIP image encoder, RAVID CLIP, enhanced with category-related prompts to improve representation learning.<n> RAVID achieves an average accuracy of 80.27% under degradation conditions, compared to 63.44% for the state-of-the-art model C2P-CLIP.
arXiv Detail & Related papers (2025-08-05T23:10:56Z) - Bi-Level Optimization for Self-Supervised AI-Generated Face Detection [56.57881725223548]
We introduce a self-supervised method for AI-generated face detectors based on bi-level optimization.<n>Our detectors significantly outperform existing approaches in both one-class and binary classification settings.
arXiv Detail & Related papers (2025-07-30T16:38:29Z) - Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection [24.512663807403186]
InfoFD is a text-guided AI-generated image detection framework.<n>We introduce two key components: the Text-Guided Conditional Information Bottleneck (TGCIB) and Dynamic Text Orthogonalization (DTO)<n>Our model achieves exceptional generalization performance on the GenImage dataset and latest generative models.
arXiv Detail & Related papers (2025-05-21T07:46:26Z) - FakeReasoning: Towards Generalizable Forgery Detection and Reasoning [24.8865218866598]
We propose modeling AI-generated image detection and explanation as a Forgery Detection and Reasoning task (FDR-Task)<n>We introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 100K images across 10 generative models.<n>We also propose FakeReasoning, a forgery detection and reasoning framework with two key components.
arXiv Detail & Related papers (2025-03-27T06:54:06Z) - Origin Identification for Text-Guided Image-to-Image Diffusion Models [39.234894330025114]
We propose origin IDentification for text-guided Image-to-image Diffusion models (ID$2$)<n>A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.<n>To solve this challenge of the proposed ID$2$ task, we contribute the first dataset and a theoretically guaranteed method.
arXiv Detail & Related papers (2025-01-04T20:34:53Z) - Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation [67.37146712877794]
IT3A is a novel test-time adaptation method that utilizes a pre-trained generative model for multi-modal augmentation of each test sample from unknown new domains.<n>By combining augmented data from pre-trained vision and language models, we enhance the ability of the model to adapt to unknown new test data.<n>In a zero-shot setting, IT3A outperforms state-of-the-art test-time prompt tuning methods with a 5.50% increase in accuracy.
arXiv Detail & Related papers (2024-12-12T20:01:24Z) - Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG)
TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms.
We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z) - Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning [49.275450836604726]
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training.<n>We employ a two-branch framework empowered by knowledge distillation, enabling the model to take both the filtered and original images as input.
arXiv Detail & Related papers (2024-09-16T15:10:07Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Reinforcement Learning from Diffusion Feedback: Q* for Image Search [2.5835347022640254]
We present two models for image generation using model-agnostic learning.
RLDF is a singular approach for visual imitation through prior-preserving reward function guidance.
It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
arXiv Detail & Related papers (2023-11-27T09:20:12Z) - Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio.
We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets.
We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z) - ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks.
The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.