Detecting Adversarial Data using Perturbation Forgery
- URL: http://arxiv.org/abs/2405.16226v4
- Date: Wed, 05 Mar 2025 02:30:54 GMT
- Title: Detecting Adversarial Data using Perturbation Forgery
- Authors: Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Shijuan Huang, Ruoxi Jia, Ning Yu,
- Abstract summary: adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data.<n>New attacks based on generative models with imbalanced and anisotropic noise patterns evade detection.<n>We propose Perturbation Forgery, which includes noise distribution, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting any unseen gradient-based, generative-based, and physical adversarial attacks.
- Score: 28.237738842260615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a defense strategy against adversarial attacks, adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data. Although previous detection methods achieve high performance in detecting gradient-based adversarial attacks, new attacks based on generative models with imbalanced and anisotropic noise patterns evade detection. Even worse, the significant inference time overhead and limited performance against unseen attacks make existing techniques impractical for real-world use. In this paper, we explore the proximity relationship among adversarial noise distributions and demonstrate the existence of an open covering for these distributions. By training on the open covering of adversarial noise distributions, a detector with strong generalization performance against various types of unseen attacks can be developed. Based on this insight, we heuristically propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting any unseen gradient-based, generative-based, and physical adversarial attacks. Comprehensive experiments conducted on multiple general and facial datasets, with a wide spectrum of attacks, validate the strong generalization of our method.
Related papers
- A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation [93.28532038721816]
Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields.
We propose a knowledge-guided adversarial defense (KGAD) to actively force malicious manipulation models to output semantically confusing samples.
arXiv Detail & Related papers (2025-04-11T10:18:13Z) - AdvSwap: Covert Adversarial Perturbation with High Frequency Info-swapping for Autonomous Driving Perception [14.326474757036925]
This paper introduces a novel adversarial attack method, AdvSwap, which creatively utilizes wavelet-based high-frequency information swapping.
The scheme effectively removes the original label data and incorporates the guidance image data, producing concealed and robust adversarial samples.
The generates adversarial samples are also difficult to perceive by humans and algorithms.
arXiv Detail & Related papers (2025-02-12T13:05:35Z) - BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning [26.714674251814586]
Federated learning is susceptible to poisoning attacks due to its decentralized nature.
We propose a novel distribution-aware anomaly detection mechanism, BoBa, to address this problem.
arXiv Detail & Related papers (2024-07-12T19:38:42Z) - Adversarial Purification for Data-Driven Power System Event Classifiers
with Diffusion Models [0.8848340429852071]
Global deployment of phasor measurement units (PMUs) enables real-time monitoring of the power system.
Recent studies reveal that machine learning-based methods are vulnerable to adversarial attacks.
This paper proposes an effective adversarial purification method based on the diffusion model to counter adversarial attacks.
arXiv Detail & Related papers (2023-11-13T06:52:56Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Detecting Adversarial Faces Using Only Real Face Self-Perturbations [36.26178169550577]
Adrial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples.
Existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces)
New attack methods especially GAN-based attacks with completely different noise patterns circumvent them and reach a higher attack success rate.
arXiv Detail & Related papers (2023-04-22T09:55:48Z) - Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z) - Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity [22.28011382580367]
adversarial attack research reveals the vulnerability of learning-based classifiers against carefully crafted perturbations.
We propose a novel algorithm that attacks semantic similarity on feature representations.
For imperceptibility, we introduce the low-frequency constraint to limit perturbations within high-frequency components.
arXiv Detail & Related papers (2022-03-10T04:46:51Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Modelling Adversarial Noise for Adversarial Defense [96.56200586800219]
adversarial defenses typically focus on exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model.
Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction.
We study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy.
arXiv Detail & Related papers (2021-09-21T01:13:26Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision.
Existing defenses are trend to harden the robustness of models against adversarial attacks.
We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.