Related papers: Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

URL: http://arxiv.org/abs/2505.22604v2
Date: Fri, 30 May 2025 10:57:06 GMT
Title: Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective
Authors: Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang,
Abstract summary: We show that adversarial training (AT) suffers from performance collapse in AIGI detection.<n>Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM)<n>TRIM builds on standard detectors and quantifies feature shifts using prediction entropy and KL divergence.
Score: 22.514709685678813
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), widely regarded as the most effective defense, suffers from performance collapse in AIGI detection. Through an information-theoretic lens, we further attribute the cause of collapse to feature entanglement, which disrupts the preservation of feature-label mutual information. Instead, standard detectors show clear feature separation. Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM), the first training-free adversarial defense for AIGI detection. TRIM builds on standard detectors and quantifies feature shifts using prediction entropy and KL divergence. Extensive experiments across multiple datasets and attacks validate the superiority of our TRIM, e.g., outperforming the state-of-the-art defense by 33.88% (28.91%) on ProGAN (GenImage), while well maintaining original accuracy.

Related papers

RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors [57.81012948133832]
We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples.<n>Our methodology generates adversarial images that transfer with a high success rate to unseen detectors.<n>Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples.
arXiv Detail & Related papers (2025-06-04T14:16:00Z)
Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection [4.269334070603315]
We propose a realistic-like Robust Black-box Adrial attack (R$2$BA) with post-processing fusion optimization.<n>We show that R$2$BA exhibits impressive anti-detection performance, excellent invisibility, and strong robustness in GAN-based and diffusion-based cases.
arXiv Detail & Related papers (2024-12-09T18:16:50Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors [62.63467652611788]
We introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images. Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness. Our findings suggest that state-of-the-art detectors exhibit varying sensitivities to the types and degrees of perturbations, data distributions, and augmentation methods used.
arXiv Detail & Related papers (2024-11-12T01:17:27Z)
Adversarial Robustness of AI-Generated Image Detectors in the Real World [13.52355280061187]
We show that current state-of-the-art classifiers are vulnerable to adversarial examples under real-world conditions.<n>Most attacks remain effective even when images are degraded during the upload to, e.g., social media platforms.<n>In a case study, we demonstrate that these robustness challenges are also found in commercial tools by conducting black-box attacks on HIVE.
arXiv Detail & Related papers (2024-10-02T14:11:29Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks [39.524974831780874]
We show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks.<n>We name our method as Frequency-based Post-train Bayesian Attack, or FPBA.
arXiv Detail & Related papers (2024-07-30T14:07:17Z)
Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. We propose an attack-agnostic defense method named Meta Invariance Defense (MID) We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z)
Adversarial Medical Image with Hierarchical Feature Hiding [38.551147309335185]
adversarial examples (AEs) pose a great security flaw in deep learning based methods for medical images. It has been discovered that conventional adversarial attacks like PGD are easy to distinguish in the feature space, resulting in accurate reactive defenses. We propose a simple-yet-effective hierarchical feature constraint (HFC), a novel add-on to conventional white-box attacks, which assists to hide the adversarial feature in the target feature distribution.
arXiv Detail & Related papers (2023-12-04T07:04:20Z)
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches [4.4100683691177816]
Adversarial attacks present a significant challenge to the dependable deployment of machine learning models. We propose Outlier Detection and Dimension Reduction (ODDR), a comprehensive defense strategy to counteract patch-based adversarial attacks. Our approach is based on the observation that input features corresponding to adversarial patches can be identified as outliers.
arXiv Detail & Related papers (2023-11-20T11:08:06Z)
Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community. Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data. We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z)
Adversarially-Aware Robust Object Detector [85.10894272034135]
We propose a Robust Detector (RobustDet) based on adversarially-aware convolution to disentangle gradients for model learning on clean and adversarial images. Our model effectively disentangles gradients and significantly enhances the detection robustness with maintaining the detection ability on clean images.
arXiv Detail & Related papers (2022-07-13T13:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.