Which Models have Perceptually-Aligned Gradients? An Explanation via
Off-Manifold Robustness
- URL: http://arxiv.org/abs/2305.19101v2
- Date: Mon, 11 Mar 2024 12:48:37 GMT
- Title: Which Models have Perceptually-Aligned Gradients? An Explanation via
Off-Manifold Robustness
- Authors: Suraj Srinivas, Sebastian Bordt, Hima Lakkaraju
- Abstract summary: perceptually-aligned gradients (PAGs) cause robust computer vision models to have rudimentary generative capabilities.
We provide a first explanation of PAGs via emphoff-manifold robustness, which states that models must be more robust off- the data manifold than they are on-manifold.
We identify three different regimes of robustness that affect both perceptual alignment and model accuracy: weak robustness, bayes-aligned robustness, and excessive robustness.
- Score: 9.867914513513453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the remarkable properties of robust computer vision models is that
their input-gradients are often aligned with human perception, referred to in
the literature as perceptually-aligned gradients (PAGs). Despite only being
trained for classification, PAGs cause robust models to have rudimentary
generative capabilities, including image generation, denoising, and
in-painting. However, the underlying mechanisms behind these phenomena remain
unknown. In this work, we provide a first explanation of PAGs via
\emph{off-manifold robustness}, which states that models must be more robust
off- the data manifold than they are on-manifold. We first demonstrate
theoretically that off-manifold robustness leads input gradients to lie
approximately on the data manifold, explaining their perceptual alignment. We
then show that Bayes optimal models satisfy off-manifold robustness, and
confirm the same empirically for robust models trained via gradient norm
regularization, randomized smoothing, and adversarial training with projected
gradient descent. Quantifying the perceptual alignment of model gradients via
their similarity with the gradients of generative models, we show that
off-manifold robustness correlates well with perceptual alignment. Finally,
based on the levels of on- and off-manifold robustness, we identify three
different regimes of robustness that affect both perceptual alignment and model
accuracy: weak robustness, bayes-aligned robustness, and excessive robustness.
Code is available at \url{https://github.com/tml-tuebingen/pags}.
Related papers
- Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI [59.96044730204345]
We introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG)
FreeMCG serves as an improved basis for explainability of a given neural network.
We show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
arXiv Detail & Related papers (2024-11-22T11:15:14Z) - Characterizing Model Robustness via Natural Input Gradients [37.97521090347974]
We show the surprising effectiveness of instead regularizing the gradient with respect to model inputs on natural examples only.
On ImageNet-1k, Gradient Norm training achieves > 90% the performance of state-of-the-art PGD-3 Adversarial Training (52% vs.56%), while using only 60% cost of the state-of-the-art without complex adversarial optimization.
arXiv Detail & Related papers (2024-09-30T09:41:34Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Do Perceptually Aligned Gradients Imply Adversarial Robustness? [17.929524924008962]
Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG)
Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications.
We show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.
arXiv Detail & Related papers (2022-07-22T23:48:26Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting.
Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z) - Adversarial robustness for latent models: Revisiting the robust-standard
accuracies tradeoff [12.386462516398472]
adversarial training is often observed to drop the standard test accuracy.
In this paper, we argue that this tradeoff is mitigated when the data enjoys a low-dimensional structure.
We show that as the manifold dimension to the ambient dimension decreases, one can obtain models that are nearly optimal with respect to both, the standard accuracy and the robust accuracy measures.
arXiv Detail & Related papers (2021-10-22T17:58:27Z) - On the Benefits of Models with Perceptually-Aligned Gradients [8.427953227125148]
We show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks.
We leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.
arXiv Detail & Related papers (2020-05-04T14:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.