Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability
- URL: http://arxiv.org/abs/2403.03967v2
- Date: Sat, 23 Mar 2024 11:22:00 GMT
- Title: Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability
- Authors: Rajdeep Haldar, Yue Xing, Qifan Song,
- Abstract summary: We argue that the existence of off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data.
For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to perturbations in the off-manifold direction.
- Score: 13.196433643727792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.
Related papers
- The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective [34.55229189445268]
Flatness of the loss surface not only correlates positively with generalization but is also related to adversarial robustness.
In this paper, we empirically analyze the relation between adversarial examples and relative flatness with respect to the parameters of one layer.
arXiv Detail & Related papers (2024-05-27T08:10:46Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Hard-label Manifolds: Unexpected Advantages of Query Efficiency for
Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives.
It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors.
We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z) - Spatiotemporal Attacks for Embodied Agents [119.43832001301041]
We take the first step to study adversarial attacks for embodied agents.
In particular, we generate adversarial examples, which exploit the interaction history in both the temporal and spatial dimensions.
Our perturbations have strong attack and generalization abilities.
arXiv Detail & Related papers (2020-05-19T01:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.