Adversarial Feature Desensitization
- URL: http://arxiv.org/abs/2006.04621v3
- Date: Tue, 4 Jan 2022 21:32:37 GMT
- Title: Adversarial Feature Desensitization
- Authors: Pouya Bashivan, Reza Bayat, Adam Ibrahim, Kartik Ahuja, Mojtaba
Faramarzi, Touraj Laleh, Blake Aaron Richards, Irina Rish
- Abstract summary: We propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field.
Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.
- Score: 12.401175943131268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks are known to be vulnerable to adversarial attacks -- slight
but carefully constructed perturbations of the inputs which can drastically
impair the network's performance. Many defense methods have been proposed for
improving robustness of deep networks by training them on adversarially
perturbed inputs. However, these models often remain vulnerable to new types of
attacks not seen during training, and even to slightly stronger versions of
previously seen attacks. In this work, we propose a novel approach to
adversarial robustness, which builds upon the insights from the domain
adaptation field. Our method, called Adversarial Feature Desensitization (AFD),
aims at learning features that are invariant towards adversarial perturbations
of the inputs. This is achieved through a game where we learn features that are
both predictive and robust (insensitive to adversarial attacks), i.e. cannot be
used to discriminate between natural and adversarial data. Empirical results on
several benchmarks demonstrate the effectiveness of the proposed approach
against a wide range of attack types and attack strengths. Our code is
available at https://github.com/BashivanLab/afd.
Related papers
- Improved Adversarial Training Through Adaptive Instance-wise Loss
Smoothing [5.1024659285813785]
Adversarial training has been the most successful defense against such adversarial attacks.
We propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training.
Our method achieves state-of-the-art robustness against $ell_infty$-norm constrained attacks.
arXiv Detail & Related papers (2023-03-24T15:41:40Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Demotivate adversarial defense in remote sensing [0.0]
We study adversarial retraining and adversarial regularization as adversarial defenses to this purpose.
We show through several experiments on public remote sensing datasets that adversarial robustness seems uncorrelated to geographic and over-fitting robustness.
arXiv Detail & Related papers (2021-05-28T15:04:37Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Class-Aware Domain Adaptation for Improving Adversarial Robustness [27.24720754239852]
adversarial training has been proposed to train networks by injecting adversarial examples into the training data.
We propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training.
arXiv Detail & Related papers (2020-05-10T03:45:19Z) - RAID: Randomized Adversarial-Input Detection for Neural Networks [7.37305608518763]
We propose a novel technique for adversarial-image detection, RAID, that trains a secondary classifier to identify differences in neuron activation values between benign and adversarial inputs.
RAID is more reliable and more effective than the state of the art when evaluated against six popular attacks.
arXiv Detail & Related papers (2020-02-07T13:27:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.