Adversarial Attacks Against Uncertainty Quantification
- URL: http://arxiv.org/abs/2309.10586v1
- Date: Tue, 19 Sep 2023 12:54:09 GMT
- Title: Adversarial Attacks Against Uncertainty Quantification
- Authors: Emanuele Ledda, Daniele Angioni, Giorgio Piras, Giorgio Fumera,
Battista Biggio and Fabio Roli
- Abstract summary: This work focuses on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate.
In particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator.
- Score: 10.655660123083607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine-learning models can be fooled by adversarial examples, i.e.,
carefully-crafted input perturbations that force models to output wrong
predictions. While uncertainty quantification has been recently proposed to
detect adversarial inputs, under the assumption that such attacks exhibit a
higher prediction uncertainty than pristine data, it has been shown that
adaptive attacks specifically aimed at reducing also the uncertainty estimate
can easily bypass this defense mechanism. In this work, we focus on a different
adversarial scenario in which the attacker is still interested in manipulating
the uncertainty estimate, but regardless of the correctness of the prediction;
in particular, the goal is to undermine the use of machine-learning models when
their outputs are consumed by a downstream module or by a human operator.
Following such direction, we: \textit{(i)} design a threat model for attacks
targeting uncertainty quantification; \textit{(ii)} devise different attack
strategies on conceptually different UQ techniques spanning for both
classification and semantic segmentation problems; \textit{(iii)} conduct a
first complete and extensive analysis to compare the differences between some
of the most employed UQ approaches under attack. Our extensive experimental
analysis shows that our attacks are more effective in manipulating uncertainty
quantification measures than attacks aimed to also induce misclassifications.
Related papers
- Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Machine Translation Models Stand Strong in the Face of Adversarial
Attacks [2.6862667248315386]
Our research focuses on the impact of adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models.
We introduce algorithms that incorporate basic text perturbations and more advanced strategies, such as the gradient-based attack.
arXiv Detail & Related papers (2023-09-10T11:22:59Z) - The Adversarial Implications of Variable-Time Inference [47.44631666803983]
We present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack.
We investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors.
We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference.
arXiv Detail & Related papers (2023-09-05T11:53:17Z) - Consistent Valid Physically-Realizable Adversarial Attack against
Crowd-flow Prediction Models [4.286570387250455]
deep learning (DL) models can effectively learn city-wide crowd-flow patterns.
DL models have been known to perform poorly on inconspicuous adversarial perturbations.
arXiv Detail & Related papers (2023-03-05T13:30:25Z) - Selecting Models based on the Risk of Damage Caused by Adversarial
Attacks [2.969705152497174]
Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications.
One of the key concerns is that adversaries can cause harm by manipulating model predictions without being detected.
We propose a method to model and statistically estimate the probability of damage arising from adversarial attacks.
arXiv Detail & Related papers (2023-01-28T10:24:38Z) - Balancing detectability and performance of attacks on the control
channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs)
This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.