Related papers: Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

URL: http://arxiv.org/abs/2305.19101v2
Date: Mon, 11 Mar 2024 12:48:37 GMT
Title: Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
Authors: Suraj Srinivas, Sebastian Bordt, Hima Lakkaraju
Abstract summary: perceptually-aligned gradients (PAGs) cause robust computer vision models to have rudimentary generative capabilities. We provide a first explanation of PAGs via emphoff-manifold robustness, which states that models must be more robust off- the data manifold than they are on-manifold. We identify three different regimes of robustness that affect both perceptual alignment and model accuracy: weak robustness, bayes-aligned robustness, and excessive robustness.
Score: 9.867914513513453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the remarkable properties of robust computer vision models is that their input-gradients are often aligned with human perception, referred to in the literature as perceptually-aligned gradients (PAGs). Despite only being trained for classification, PAGs cause robust models to have rudimentary generative capabilities, including image generation, denoising, and in-painting. However, the underlying mechanisms behind these phenomena remain unknown. In this work, we provide a first explanation of PAGs via \emph{off-manifold robustness}, which states that models must be more robust off- the data manifold than they are on-manifold. We first demonstrate theoretically that off-manifold robustness leads input gradients to lie approximately on the data manifold, explaining their perceptual alignment. We then show that Bayes optimal models satisfy off-manifold robustness, and confirm the same empirically for robust models trained via gradient norm regularization, randomized smoothing, and adversarial training with projected gradient descent. Quantifying the perceptual alignment of model gradients via their similarity with the gradients of generative models, we show that off-manifold robustness correlates well with perceptual alignment. Finally, based on the levels of on- and off-manifold robustness, we identify three different regimes of robustness that affect both perceptual alignment and model accuracy: weak robustness, bayes-aligned robustness, and excessive robustness. Code is available at \url{https://github.com/tml-tuebingen/pags}.

Related papers

Solving Inverse Problems with FLAIR [59.02385492199431]
Flow-based latent generative models are able to generate images with remarkable quality, even enabling text-to-image generation.<n>We present FLAIR, a novel training free variational framework that leverages flow-based generative models as a prior for inverse problems.<n>Results on standard imaging benchmarks demonstrate that FLAIR consistently outperforms existing diffusion- and flow-based methods in terms of reconstruction quality and sample diversity.
arXiv Detail & Related papers (2025-06-03T09:29:47Z)
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI [59.96044730204345]
We introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG) FreeMCG serves as an improved basis for explainability of a given neural network. We show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
arXiv Detail & Related papers (2024-11-22T11:15:14Z)
Attuned to Change: Causal Fine-Tuning under Latent-Confounded Shifts [32.989526411946606]
Adapting to latent-confounded shifts remains a core challenge in modern AI.<n>One practical failure mode arises when fine-tuning pre-trained foundation models on confounded data.<n>We frame causal fine-tuning as an identification problem and pose an explicit causal model that decomposes inputs into low-level spurious features.
arXiv Detail & Related papers (2024-10-18T11:06:23Z)
Characterizing Model Robustness via Natural Input Gradients [37.97521090347974]
We show the surprising effectiveness of instead regularizing the gradient with respect to model inputs on natural examples only. On ImageNet-1k, Gradient Norm training achieves > 90% the performance of state-of-the-art PGD-3 Adversarial Training (52% vs.56%), while using only 60% cost of the state-of-the-art without complex adversarial optimization.
arXiv Detail & Related papers (2024-09-30T09:41:34Z)
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z)
ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z)
Do Perceptually Aligned Gradients Imply Adversarial Robustness? [17.929524924008962]
Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG) Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. We show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.
arXiv Detail & Related papers (2022-07-22T23:48:26Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z)
Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting. Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z)
Adversarial robustness for latent models: Revisiting the robust-standard accuracies tradeoff [12.386462516398472]
adversarial training is often observed to drop the standard test accuracy. In this paper, we argue that this tradeoff is mitigated when the data enjoys a low-dimensional structure. We show that as the manifold dimension to the ambient dimension decreases, one can obtain models that are nearly optimal with respect to both, the standard accuracy and the robust accuracy measures.
arXiv Detail & Related papers (2021-10-22T17:58:27Z)
On the Benefits of Models with Perceptually-Aligned Gradients [8.427953227125148]
We show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. We leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.
arXiv Detail & Related papers (2020-05-04T14:05:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.