How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness
- URL: http://arxiv.org/abs/2601.14519v2
- Date: Sun, 25 Jan 2026 11:11:40 GMT
- Title: How Worst-Case Are Adversarial Attacks? Linking Adversarial and Perturbation Robustness
- Authors: Giulio Rossolini,
- Abstract summary: Adrial attacks are widely used to identify model vulnerabilities, but their validity as proxies for robustness to random perturbations remains debated.<n>We ask whether an adversarial example provides a representative estimate of misprediction risk under perturbations of the same magnitude.<n>We study the limits of this connection by proposing an attack strategy designed to probe vulnerabilities in regimes that are statistically closer to uniform noise.
- Score: 4.60092781176058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adversarial attacks are widely used to identify model vulnerabilities; however, their validity as proxies for robustness to random perturbations remains debated. We ask whether an adversarial example provides a representative estimate of misprediction risk under stochastic perturbations of the same magnitude, or instead reflects an atypical worst-case event. To address this question, we introduce a probabilistic analysis that quantifies this risk with respect to directionally biased perturbation distributions, parameterized by a concentration factor $κ$ that interpolates between isotropic noise and adversarial directions. Building on this, we study the limits of this connection by proposing an attack strategy designed to probe vulnerabilities in regimes that are statistically closer to uniform noise. Experiments on ImageNet and CIFAR-10 systematically benchmark multiple attacks, revealing when adversarial success meaningfully reflects robustness to perturbations and when it does not, thereby informing their use in safety-oriented robustness evaluation.
Related papers
- Adversarial Samples Are Not Created Equal [42.879013923494455]
We propose an ensemble-based metric to measure the manipulation of non-robust features by adversarial perturbations.<n>This new perspective also allows us to re-examine multiple phenomena, including the impact of sharpness-aware minimization on adversarial robustness.
arXiv Detail & Related papers (2026-01-02T05:30:42Z) - Quantifying the Risk of Transferred Black Box Attacks [0.0]
Neural networks have become pervasive across various applications, including security-related products.<n>This paper investigates the complexities involved in resilience testing against transferred adversarial attacks.<n>We propose a targeted resilience testing framework that employs surrogate models strategically selected based on Centered Kernel Alignment (CKA) similarity.
arXiv Detail & Related papers (2025-11-07T09:34:43Z) - Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency [25.830427564563422]
Class-Specific Anomaly Detection (CSAD) is an effective novel anomaly detection approach.<n> CSAD evaluates adversarial samples relative to their predicted class distribution, rather than a broad benign distribution.<n>Our evaluation incorporates both anomaly detection rates with SHAP-based assessments to provide a more comprehensive measure of adversarial sample quality.
arXiv Detail & Related papers (2024-12-10T09:17:09Z) - Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Detecting Adversarial Data using Perturbation Forgery [28.237738842260615]
adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data.<n>New attacks based on generative models with imbalanced and anisotropic noise patterns evade detection.<n>We propose Perturbation Forgery, which includes noise distribution, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting any unseen gradient-based, generative-based, and physical adversarial attacks.
arXiv Detail & Related papers (2024-05-25T13:34:16Z) - From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks [54.25638567385662]
Differential Privacy in machine learning is often interpreted as guarantees against membership inference.<n> translating DP budgets into quantitative protection against the more damaging threat of data reconstruction remains a challenging open problem.<n>This paper bridges the critical gap by deriving the first formal privacy bounds tailored to the mechanics of demonstrated "from-scratch" attacks.
arXiv Detail & Related papers (2024-02-20T09:52:30Z) - Heteroscedastic Uncertainty Estimation Framework for Unsupervised Registration [32.081258147692395]
We propose a framework for heteroscedastic image uncertainty estimation.
It can adaptively reduce the influence of regions with high uncertainty during unsupervised registration.
Our method consistently outperforms baselines and produces sensible uncertainty estimates.
arXiv Detail & Related papers (2023-12-01T01:03:06Z) - Adversarial Attacks Against Uncertainty Quantification [10.655660123083607]
This work focuses on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate.
In particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator.
arXiv Detail & Related papers (2023-09-19T12:54:09Z) - Selecting Models based on the Risk of Damage Caused by Adversarial
Attacks [2.969705152497174]
Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications.
One of the key concerns is that adversaries can cause harm by manipulating model predictions without being detected.
We propose a method to model and statistically estimate the probability of damage arising from adversarial attacks.
arXiv Detail & Related papers (2023-01-28T10:24:38Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - Balancing detectability and performance of attacks on the control
channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs)
This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.