DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images
- URL: http://arxiv.org/abs/2504.19876v1
- Date: Mon, 28 Apr 2025 15:06:28 GMT
- Title: DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images
- Authors: Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdelmalik Taleb-Ahmed, Abdenour Hadid,
- Abstract summary: DeeCLIP is a novel framework for detecting AI-generated images.<n>It incorporates DeeFuser, a fusion module that combines high-level and low-level features.<n>We trained exclusively on 4-class ProGAN data, DeeCLIP achieves an average accuracy of 89.90%.
- Score: 14.448350657613368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces DeeCLIP, a novel framework for detecting AI-generated images using CLIP-ViT and fusion learning. Despite significant advancements in generative models capable of creating highly photorealistic images, existing detection methods often struggle to generalize across different models and are highly sensitive to minor perturbations. To address these challenges, DeeCLIP incorporates DeeFuser, a fusion module that combines high-level and low-level features, improving robustness against degradations such as compression and blurring. Additionally, we apply triplet loss to refine the embedding space, enhancing the model's ability to distinguish between real and synthetic content. To further enable lightweight adaptation while preserving pre-trained knowledge, we adopt parameter-efficient fine-tuning using low-rank adaptation (LoRA) within the CLIP-ViT backbone. This approach supports effective zero-shot learning without sacrificing generalization. Trained exclusively on 4-class ProGAN data, DeeCLIP achieves an average accuracy of 89.00% on 19 test subsets composed of generative adversarial network (GAN) and diffusion models. Despite having fewer trainable parameters, DeeCLIP outperforms existing methods, demonstrating superior robustness against various generative models and real-world distortions. The code is publicly available at https://github.com/Mamadou-Keita/DeeCLIP for research purposes.
Related papers
- Federated Learning of Low-Rank One-Shot Image Detection Models in Edge Devices with Scalable Accuracy and Compute Complexity [5.820612543019548]
LoRa-FL is designed for training low-rank one-shot image detection models deployed on edge devices.
By incorporating low-rank adaptation techniques into one-shot detection architectures, our method significantly reduces both computational and communication overhead.
arXiv Detail & Related papers (2025-04-23T08:40:44Z) - CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization.<n>We propose a novel framework, Co-Spy, that first enhances existing semantic features.<n>We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z) - LayerMix: Enhanced Data Augmentation through Fractal Integration for Robust Deep Learning [1.786053901581251]
Deep learning models often struggle to maintain consistent performance when confronted with Out-of-Distribution (OOD) samples.<n>We introduce LayerMix, an innovative data augmentation approach that systematically enhances model robustness.<n>Our method generates semantically consistent synthetic samples that significantly improve neural network generalization capabilities.
arXiv Detail & Related papers (2025-01-08T22:22:44Z) - Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers [3.2492319522383717]
Contrastive Language-Image Pre-training (CLIP) has attracted a surge of attention for its superior zero-shot performance and excellent transferability to downstream tasks.<n>However, training such large-scale models usually requires substantial computation and storage, which poses barriers for general users with consumer-level computers.
arXiv Detail & Related papers (2024-11-22T08:17:46Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Mixture of Low-rank Experts for Transferable AI-Generated Image Detection [18.631006488565664]
Generative models have shown a giant leap in photo-realistic images with minimal expertise, sparking concerns about the authenticity of online information.
This study aims to develop a universal AI-generated image detector capable of identifying images from diverse sources.
Inspired by the zero-shot transferability of pre-trained vision-language models, we seek to harness the non-trivial visual-world knowledge and descriptive proficiency of CLIP-ViT to generalize over unknown domains.
arXiv Detail & Related papers (2024-04-07T09:01:50Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z) - One-Shot Adaptation of GAN in Just One CLIP [51.188396199083336]
We present a novel single-shot GAN adaptation method through unified CLIP space manipulations.
Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization.
We show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-17T13:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.