Related papers: Raising the Bar of AI-generated Image Detection with CLIP

Raising the Bar of AI-generated Image Detection with CLIP

URL: http://arxiv.org/abs/2312.00195v2
Date: Mon, 29 Apr 2024 14:25:42 GMT
Title: Raising the Bar of AI-generated Image Detection with CLIP
Authors: Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, Luisa Verdoliva,
Abstract summary: The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
Score: 50.345365081177555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset for training. On the contrary, by using only a handful of example images from a single generative model, a CLIP-based detector exhibits surprising generalization ability and high robustness across different architectures, including recent commercial tools such as Dalle-3, Midjourney v5, and Firefly. We match the state-of-the-art (SoTA) on in-distribution data and significantly improve upon it in terms of generalization to out-of-distribution data (+6% AUC) and robustness to impaired/laundered data (+13%). Our project is available at https://grip-unina.github.io/ClipBased-SyntheticImageDetection/

Related papers

Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval [76.86914849263168]
Open-set 3D object retrieval is an emerging task aiming to retrieve 3D objects of unseen categories beyond the training set.<n>Existing methods typically utilize all modalities (i.e., voxels, point clouds, multi-view images) and train specific backbones before fusion.<n>We present a framework named Describe, Adapt and Combine (DAC) by taking only multi-view images for open-set 3DOR.
arXiv Detail & Related papers (2025-07-29T04:11:05Z)
Task-aligned prompting improves zero-shot detection of AI-generated images by Vision-Language Models [2.005104318774207]
In this work, we investigate the use of pre-trained Vision-Language Models for zero-shot detection of AI-generated images.<n>We show that task-aligned prompting elicits more focused reasoning and significantly improves performance without fine-tuning.<n>Our findings show that task-aligned prompts elicit more focused reasoning and enhance latent capabilities in VLMs.
arXiv Detail & Related papers (2025-05-20T22:44:04Z)
CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier [0.0]
This work investigates whether CLIP embeddings inherently contain indicative information of AI generation.<n>Experiments on the public CIFAKE benchmark show the performance reaches 95% accuracy without language reasoning.<n>Some specific image types, such as wide-angle photographs and oil paintings, pose significant challenges to classification.
arXiv Detail & Related papers (2025-05-15T19:14:39Z)
Few-Shot Learner Generalizes Across AI-Generated Image Detection [14.069833211684715]
Few-Shot Detector (FSD) is a novel AI-generated image detector which learns a specialized metric space to effectively distinguish unseen fake images. Experiments show FSD state-of-the-art performance by $+7.4%$ average ACC on GenImage dataset.
arXiv Detail & Related papers (2025-01-15T12:33:11Z)
Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z)
A Sanity Check for AI-generated Image Detection [49.08585395873425]
We present a sanity check on whether the task of AI-generated image detection has been solved. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. We propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns.
arXiv Detail & Related papers (2024-06-27T17:59:49Z)
Improving Interpretability and Robustness for the Detection of AI-Generated Images [6.116075037154215]
We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings. We show how to interpret them, shedding light on how images produced by various AI generators differ from real ones.
arXiv Detail & Related papers (2024-06-21T10:33:09Z)
Mixture of Low-rank Experts for Transferable AI-Generated Image Detection [18.631006488565664]
Generative models have shown a giant leap in photo-realistic images with minimal expertise, sparking concerns about the authenticity of online information. This study aims to develop a universal AI-generated image detector capable of identifying images from diverse sources. Inspired by the zero-shot transferability of pre-trained vision-language models, we seek to harness the non-trivial visual-world knowledge and descriptive proficiency of CLIP-ViT to generalize over unknown domains.
arXiv Detail & Related papers (2024-04-07T09:01:50Z)
Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z)
Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks. We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z)
Dynamic Relevance Learning for Few-Shot Object Detection [6.550840743803705]
We propose a dynamic relevance learning model, which utilizes the relationship between all support images and Region of Interest (RoI) on the query images to construct a dynamic graph convolutional network (GCN) The proposed model achieves the best overall performance, which shows its effectiveness of learning more generalized features.
arXiv Detail & Related papers (2021-08-04T18:29:42Z)
CutPaste: Self-Supervised Learning for Anomaly Detection and Localization [59.719925639875036]
We propose a framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects.
arXiv Detail & Related papers (2021-04-08T19:04:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.