Related papers: Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection

Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection

URL: http://arxiv.org/abs/2407.19553v2
Date: Wed, 23 Oct 2024 16:06:32 GMT
Title: Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection
Authors: Vincenzo De Rosa, Fabrizio Guillaro, Giovanni Poggi, Davide Cozzolino, Luisa Verdoliva,
Abstract summary: We study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. This analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.
Score: 9.516391314161154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, many forensic detectors have been proposed to detect AI-generated images and prevent their use for malicious purposes. Convolutional neural networks (CNNs) have long been the dominant architecture in this field and have been the subject of intense study. However, recently proposed Transformer-based detectors have been shown to match or even outperform CNN-based detectors, especially in terms of generalization. In this paper, we study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods that rely on Visual Transformer (ViT) backbones and comparing their performance with CNN-based methods. We study the robustness to different adversarial attacks under a variety of conditions and analyze both numerical results and frequency-domain patterns. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. However, attacks do not easily transfer between CNN-based and CLIP-based methods. This is also confirmed by the different distribution of the adversarial noise patterns in the frequency domain. Overall, this analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.

Related papers

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors [57.81012948133832]
We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples.<n>Our methodology generates adversarial images that transfer with a high success rate to unseen detectors.<n>Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples.
arXiv Detail & Related papers (2025-06-04T14:16:00Z)
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z)
Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks [17.87119255294563]
We investigate the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings. We propose a new attack containing two main parts. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. We show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection.
arXiv Detail & Related papers (2024-07-30T14:07:17Z)
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE) A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z)
New Adversarial Image Detection Based on Sentiment Analysis [37.139957973240264]
adversarial attack models, e.g., DeepFool, are on the rise and outrunning adversarial example detection techniques. This paper presents a new adversarial example detector that outperforms state-of-the-art detectors in identifying the latest adversarial attacks on image datasets.
arXiv Detail & Related papers (2023-05-03T14:32:21Z)
Detection of Adversarial Physical Attacks in Time-Series Image Data [12.923271427789267]
We propose VisionGuard* (VG), which couples VG with majority-vote methods, to detect adversarial physical attacks in time-series image data. This is motivated by autonomous systems applications where images are collected over time using onboard sensors for decision-making purposes. We have evaluated VG* on videos of both clean and physically attacked traffic signs generated by a state-of-the-art robust physical attack.
arXiv Detail & Related papers (2023-04-27T02:08:13Z)
Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications. The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system. We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z)
Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples. We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain [10.418647759223964]
We show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods.
arXiv Detail & Related papers (2021-03-04T12:48:28Z)
Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.