Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
- URL: http://arxiv.org/abs/2511.16541v2
- Date: Fri, 21 Nov 2025 08:44:06 GMT
- Title: Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution
- Authors: Jaime Álvarez Urueña, David Camacho, Javier Huertas Tato,
- Abstract summary: This work proposes a novel two-stage detection framework to address the generalization challenge inherent in synthetic image detection.<n>The proposed framework achieves an average detection accuracy of 91.3%, representing a 5.2 percentage point improvement over existing approaches.
- Score: 3.103291412074661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of generative artificial intelligence has enabled the creation of synthetic images that are increasingly indistinguishable from authentic content, posing significant challenges for digital media integrity. This problem is compounded by the accelerated release cycle of novel generative models, which renders traditional detection approaches (reliant on periodic retraining) computationally infeasible and operationally impractical. This work proposes a novel two-stage detection framework designed to address the generalization challenge inherent in synthetic image detection. The first stage employs a vision deep learning model trained via supervised contrastive learning to extract discriminative embeddings from input imagery. Critically, this model was trained on a strategically partitioned subset of available generators, with specific architectures withheld from training to rigorously ablate cross-generator generalization capabilities. The second stage utilizes a k-nearest neighbors (k-NN) classifier operating on the learned embedding space, trained in a few-shot learning paradigm incorporating limited samples from previously unseen test generators. With merely 150 images per class in the few-shot learning regime, which are easily obtainable from current generation models, the proposed framework achieves an average detection accuracy of 91.3%, representing a 5.2 percentage point improvement over existing approaches . For the source attribution task, the proposed approach obtains improvements of of 14.70% and 4.27% in AUC and OSCR respectively on an open set classification context, marking a significant advancement toward robust, scalable forensic attribution systems capable of adapting to the evolving generative AI landscape without requiring exhaustive retraining protocols.
Related papers
- Image Tokenizer Needs Post-Training [76.91832192778732]
We propose a novel tokenizer training scheme, focusing on improving latent space construction and decoding respectively.<n>Specifically, we propose a plug-and-play tokenizer training scheme, which significantly enhances the robustness of tokenizer.<n>We further optimize the tokenizer decoder regarding a well-trained generative model to mitigate the distribution difference between generated and reconstructed tokens.
arXiv Detail & Related papers (2025-09-15T21:38:03Z) - Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis [57.7367843129838]
Recent image generation schemes typically capture image distribution in a pre-constructed latent space relying on a frozen image tokenizer.<n>We propose a novel plug-and-play tokenizer training scheme to facilitate latent space construction.
arXiv Detail & Related papers (2025-03-11T12:09:11Z) - INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration [22.19661915697775]
We propose a novel INN-guided probabilistic diffusion algorithm for non-blind and blind image restoration.<n>INDIGO and BlindINDIGO combine the merits of the perfect reconstruction property of invertible neural networks (INN) with the strong generative capabilities of pre-trained diffusion models.
arXiv Detail & Related papers (2025-01-23T18:51:52Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - Mixture of Low-rank Experts for Transferable AI-Generated Image Detection [18.631006488565664]
Generative models have shown a giant leap in photo-realistic images with minimal expertise, sparking concerns about the authenticity of online information.
This study aims to develop a universal AI-generated image detector capable of identifying images from diverse sources.
Inspired by the zero-shot transferability of pre-trained vision-language models, we seek to harness the non-trivial visual-world knowledge and descriptive proficiency of CLIP-ViT to generalize over unknown domains.
arXiv Detail & Related papers (2024-04-07T09:01:50Z) - Bi-LORA: A Vision-Language Approach for Synthetic Image Detection [14.448350657613364]
Deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs) have ushered in an era of generating highly realistic images.
This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs)
We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images.
arXiv Detail & Related papers (2024-04-02T13:54:22Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Adversarial Masking Contrastive Learning for vein recognition [10.886119051977785]
Vein recognition has received increasing attention due to its high security and privacy.
Deep neural networks such as Convolutional neural networks (CNN) and Transformers have been introduced for vein recognition.
Despite the recent advances, existing solutions for finger-vein feature extraction are still not optimal due to scarce training image samples.
arXiv Detail & Related papers (2024-01-16T03:09:45Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - MENTOR: Human Perception-Guided Pretraining for Increased Generalization [4.737519767218666]
Leveraging human perception into training of convolutional neural networks (CNN) has boosted generalization capabilities of such models in open-set recognition tasks.<n>We introduce MENTOR, which addresses this question through two unique rounds of training CNNs tasked with open-set anomaly detection.<n>We show that MENTOR successfully raises the generalization performance across three different CNN backbones in a variety of anomaly detection tasks.
arXiv Detail & Related papers (2023-10-30T13:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.