Do Perceptually Aligned Gradients Imply Adversarial Robustness?
- URL: http://arxiv.org/abs/2207.11378v3
- Date: Wed, 9 Aug 2023 17:06:52 GMT
- Title: Do Perceptually Aligned Gradients Imply Adversarial Robustness?
- Authors: Roy Ganz, Bahjat Kawar and Michael Elad
- Abstract summary: Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG)
Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications.
We show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.
- Score: 17.929524924008962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarially robust classifiers possess a trait that non-robust models do
not -- Perceptually Aligned Gradients (PAG). Their gradients with respect to
the input align well with human perception. Several works have identified PAG
as a byproduct of robust training, but none have considered it as a standalone
phenomenon nor studied its own implications. In this work, we focus on this
trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}.
To this end, we develop a novel objective to directly promote PAG in training
classifiers and examine whether models with such gradients are more robust to
adversarial attacks. Extensive experiments on multiple datasets and
architectures validate that models with aligned gradients exhibit significant
robustness, exposing the surprising bidirectional connection between PAG and
robustness. Lastly, we show that better gradient alignment leads to increased
robustness and harness this observation to boost the robustness of existing
adversarial training techniques.
Related papers
- A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness.
Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z) - Ensemble Adversarial Defense via Integration of Multiple Dispersed Low Curvature Models [7.8245455684263545]
In this work, we aim to enhance ensemble diversity by reducing attack transferability.
We identify second-order gradients, which depict the loss curvature, as a key factor in adversarial robustness.
We introduce a novel regularizer to train multiple more-diverse low-curvature network models.
arXiv Detail & Related papers (2024-03-25T03:44:36Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Enhancing Robust Representation in Adversarial Training: Alignment and
Exclusion Criteria [61.048842737581865]
We show that Adversarial Training (AT) omits to learning robust features, resulting in poor performance of adversarial robustness.
We propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention.
Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-10-05T07:29:29Z) - Which Models have Perceptually-Aligned Gradients? An Explanation via
Off-Manifold Robustness [9.867914513513453]
perceptually-aligned gradients (PAGs) cause robust computer vision models to have rudimentary generative capabilities.
We provide a first explanation of PAGs via emphoff-manifold robustness, which states that models must be more robust off- the data manifold than they are on-manifold.
We identify three different regimes of robustness that affect both perceptual alignment and model accuracy: weak robustness, bayes-aligned robustness, and excessive robustness.
arXiv Detail & Related papers (2023-05-30T15:06:02Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability.
In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Quantifying the Preferential Direction of the Model Gradient in
Adversarial Training With Projected Gradient Descent [4.8035104863603575]
After adversarial training, gradients of models with respect to their inputs have a preferential direction.
We propose a novel definition of this direction as the direction of the vector pointing toward the closest point of the support of the closest inaccurate class in decision space.
We show that our metric presents higher alignment values than a competing metric formulation, and that enforcing this alignment increases the robustness of models.
arXiv Detail & Related papers (2020-09-10T07:48:42Z) - On the Benefits of Models with Perceptually-Aligned Gradients [8.427953227125148]
We show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks.
We leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.
arXiv Detail & Related papers (2020-05-04T14:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.