Related papers: Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

URL: http://arxiv.org/abs/2407.20836v1
Date: Tue, 30 Jul 2024 14:07:17 GMT
Title: Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks
Authors: Yunfeng Diao, Naixin Zhai, Changtao Miao, Xun Yang, Meng Wang,
Abstract summary: We investigate the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings. We propose a new attack containing two main parts. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. We show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection.
Score: 17.87119255294563
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of these AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. For the task of AIGI detection, we propose a new attack containing two main parts. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous models, e.g. transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as frequency-based post-train Bayesian attack, or FPBA. Through FPBA, we show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection, which is a crucial real-world detection scenario.

Related papers

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors [57.81012948133832]
We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples.<n>Our methodology generates adversarial images that transfer with a high success rate to unseen detectors.<n>Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples.
arXiv Detail & Related papers (2025-06-04T14:16:00Z)
Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective [22.514709685678813]
We show that adversarial training (AT) suffers from performance collapse in AIGI detection.<n>Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM)<n>TRIM builds on standard detectors and quantifies feature shifts using prediction entropy and KL divergence.
arXiv Detail & Related papers (2025-05-28T17:20:49Z)
Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG) TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms. We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z)
Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors [14.284639462471274]
We evaluate state-of-the-art AI-generated image (AIGI) detectors under different attack scenarios. Attacks can significantly reduce detection accuracy to the extent that the risks of relying on detectors outweigh their benefits. We propose a simple defense mechanism to make CLIP-based detectors, which are currently the best-performing detectors, robust against these attacks.
arXiv Detail & Related papers (2024-10-02T14:11:29Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection [9.516391314161154]
We study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. This analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.
arXiv Detail & Related papers (2024-07-28T18:20:08Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection. RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z)
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE) A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z)
Black-Box Attack against GAN-Generated Image Detector with Contrastive Perturbation [0.4297070083645049]
We propose a new black-box attack method against GAN-generated image detectors. A novel contrastive learning strategy is adopted to train the encoder-decoder network based anti-forensic model. The proposed attack effectively reduces the accuracy of three state-of-the-art detectors on six popular GANs.
arXiv Detail & Related papers (2022-11-07T12:56:14Z)
Exploring Frequency Adversarial Attacks for Face Forgery Detection [59.10415109589605]
We propose a frequency adversarial attack method against face forgery detectors. Inspired by the idea of meta-learning, we also propose a hybrid adversarial attack that performs attacks in both the spatial and frequency domains.
arXiv Detail & Related papers (2022-03-29T15:34:13Z)
Towards Adversarial Patch Analysis and Certified Defense against Crowd Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems. Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks. We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z)
Online Alternate Generator against Adversarial Attacks [144.45529828523408]
Deep learning models are notoriously sensitive to adversarial examples which are synthesized by adding quasi-perceptible noises on real images. We propose a portable defense method, online alternate generator, which does not need to access or modify the parameters of the target networks. The proposed method works by online synthesizing another image from scratch for an input image, instead of removing or destroying adversarial noises.
arXiv Detail & Related papers (2020-09-17T07:11:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.