Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks
- URL: http://arxiv.org/abs/2505.03435v1
- Date: Tue, 06 May 2025 11:19:01 GMT
- Title: Robustness in AI-Generated Detection: Enhancing Resistance to Adversarial Attacks
- Authors: Sun Haoxuan, Hong Yan, Zhan Jiahui, Chen Haoxing, Lan Jun, Zhu Huijia, Wang Weiqiang, Zhang Liqing, Zhang Jianfu,
- Abstract summary: This paper investigates the vulnerabilities of current AI-generated face detection systems.<n>We propose an approach that integrates adversarial training to mitigate the impact of adversarial examples.<n>We also provide an in-depth analysis of adversarial and benign examples, offering insights into the intrinsic characteristics of AI-generated content.
- Score: 4.179092469766417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of generative image technology has introduced significant security concerns, particularly in the domain of face generation detection. This paper investigates the vulnerabilities of current AI-generated face detection systems. Our study reveals that while existing detection methods often achieve high accuracy under standard conditions, they exhibit limited robustness against adversarial attacks. To address these challenges, we propose an approach that integrates adversarial training to mitigate the impact of adversarial examples. Furthermore, we utilize diffusion inversion and reconstruction to further enhance detection robustness. Experimental results demonstrate that minor adversarial perturbations can easily bypass existing detection systems, but our method significantly improves the robustness of these systems. Additionally, we provide an in-depth analysis of adversarial and benign examples, offering insights into the intrinsic characteristics of AI-generated content. All associated code will be made publicly available in a dedicated repository to facilitate further research and verification.
Related papers
- RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors [57.81012948133832]
We present RAID (Robust evaluation of AI-generated image Detectors), a dataset of 72k diverse and highly transferable adversarial examples.<n>Our methodology generates adversarial images that transfer with a high success rate to unseen detectors.<n>Our findings indicate that current state-of-the-art AI-generated image detectors can be easily deceived by adversarial examples.
arXiv Detail & Related papers (2025-06-04T14:16:00Z) - AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection [16.00110817349377]
AvatarShield is the first interpretable MLLM-based framework for detecting human-centric fake videos.<n>It effectively avoids the use of high-cost text annotation data, enabling precise temporal modeling and forgery detection.<n>We also design a dual-encoder architecture, combining high-level semantic reasoning and low-level artifact amplification to guide MLLMs in effective forgery detection.
arXiv Detail & Related papers (2025-05-21T06:43:34Z) - Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection [21.03032944637112]
Adrial attacks pose a critical security threat to real-world AI systems.<n>This paper proposes a lightweight adversarial detection framework based on the large-scale pre-trained vision-language model CLIP.
arXiv Detail & Related papers (2025-04-01T05:21:45Z) - Adversarial Robustness of AI-Generated Image Detectors in the Real World [13.52355280061187]
We show that current state-of-the-art classifiers are vulnerable to adversarial examples under real-world conditions.<n>Most attacks remain effective even when images are degraded during the upload to, e.g., social media platforms.<n>In a case study, we demonstrate that these robustness challenges are also found in commercial tools by conducting black-box attacks on HIVE.
arXiv Detail & Related papers (2024-10-02T14:11:29Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - A Survey and Evaluation of Adversarial Attacks for Object Detection [11.48212060875543]
Deep learning models are vulnerable to adversarial examples that can deceive them into making confident but incorrect predictions.<n>This vulnerability pose significant risks in high-stakes applications such as autonomous vehicles, security surveillance, and safety-critical inspection systems.<n>This paper presents a novel taxonomic framework for categorizing adversarial attacks specific to object detection architectures.
arXiv Detail & Related papers (2024-08-04T05:22:08Z) - Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack [24.954755569786396]
We propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection.
We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness.
The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content.
arXiv Detail & Related papers (2024-04-02T12:49:22Z) - FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Adversarially-Aware Robust Object Detector [85.10894272034135]
We propose a Robust Detector (RobustDet) based on adversarially-aware convolution to disentangle gradients for model learning on clean and adversarial images.
Our model effectively disentangles gradients and significantly enhances the detection robustness with maintaining the detection ability on clean images.
arXiv Detail & Related papers (2022-07-13T13:59:59Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.