Meta Adversarial Perturbations
- URL: http://arxiv.org/abs/2111.10291v1
- Date: Fri, 19 Nov 2021 16:01:45 GMT
- Title: Meta Adversarial Perturbations
- Authors: Chia-Hung Yuan, Pin-Yu Chen, Chia-Mu Yu
- Abstract summary: We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
- Score: 66.43754467275967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A plethora of attack methods have been proposed to generate adversarial
examples, among which the iterative methods have been demonstrated the ability
to find a strong attack. However, the computation of an adversarial
perturbation for a new data point requires solving a time-consuming
optimization problem from scratch. To generate a stronger attack, it normally
requires updating a data point with more iterations. In this paper, we show the
existence of a meta adversarial perturbation (MAP), a better initialization
that causes natural images to be misclassified with high probability after
being updated through only a one-step gradient ascent update, and propose an
algorithm for computing such perturbations. We conduct extensive experiments,
and the empirical results demonstrate that state-of-the-art deep neural
networks are vulnerable to meta perturbations. We further show that these
perturbations are not only image-agnostic, but also model-agnostic, as a single
perturbation generalizes well across unseen data points and different neural
network architectures.
Related papers
- Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Efficient and Robust Classification for Sparse Attacks [34.48667992227529]
We consider perturbations bounded by the $ell$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection.
We propose a novel defense method that consists of "truncation" and "adrial training"
Motivated by the insights we obtain, we extend these components to neural network classifiers.
arXiv Detail & Related papers (2022-01-23T21:18:17Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - AdvHaze: Adversarial Haze Attack [19.744435173861785]
We introduce a novel adversarial attack method based on haze, which is a common phenomenon in real-world scenery.
Our method can synthesize potentially adversarial haze into an image based on the atmospheric scattering model with high realisticity.
We demonstrate that the proposed method achieves a high success rate, and holds better transferability across different classification models than the baselines.
arXiv Detail & Related papers (2021-04-28T09:52:25Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Universal Adversarial Perturbations: A Survey [0.0]
Deep neural networks are susceptible to adversarial perturbations.
These perturbations can cause the network's prediction to change without making perceptible changes to the input image.
We provide a detailed discussion on the various data-driven and data-independent methods for generating universal perturbations.
arXiv Detail & Related papers (2020-05-16T20:18:26Z) - Architectural Resilience to Foreground-and-Background Adversarial Noise [0.0]
Adrial attacks in the form of imperceptible perturbations of normal images have been extensively studied.
We propose distinct model-agnostic benchmark perturbations of images to investigate resilience and robustness of different network architectures.
arXiv Detail & Related papers (2020-03-23T01:38:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.