Algebraic Adversarial Attacks on Integrated Gradients
- URL: http://arxiv.org/abs/2407.16233v2
- Date: Thu, 27 Feb 2025 00:13:57 GMT
- Title: Algebraic Adversarial Attacks on Integrated Gradients
- Authors: Lachlan Simpson, Federico Costanza, Kyle Millar, Adriel Cheng, Cheng-Chew Lim, Hong Gunn Chew,
- Abstract summary: Path methods are one such class of attribution methods susceptible to adversarial attacks.<n> Algebraic adversarial examples provide a mathematically tractable approach to adversarial examples.
- Score: 5.286919475372417
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adversarial attacks on explainability models have drastic consequences when explanations are used to understand the reasoning of neural networks in safety critical systems. Path methods are one such class of attribution methods susceptible to adversarial attacks. Adversarial learning is typically phrased as a constrained optimisation problem. In this work, we propose algebraic adversarial examples and study the conditions under which one can generate adversarial examples for integrated gradients. Algebraic adversarial examples provide a mathematically tractable approach to adversarial examples.
Related papers
- Algebraic Adversarial Attacks on Explainability Models [5.286919475372417]
Algebraic adversarial examples provide a mathematically tractable approach to adversarial examples.
We validate our approach on two well-known and one real-world dataset.
arXiv Detail & Related papers (2025-03-16T22:55:02Z) - Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Adversarial Attack Based on Prediction-Correction [8.467466998915018]
Deep neural networks (DNNs) are vulnerable to adversarial examples obtained by adding small perturbations to original examples.
In this paper, a new prediction-correction (PC) based adversarial attack is proposed.
In our proposed PC-based attack, some existing attack can be selected to produce a predicted example first, and then the predicted example and the current example are combined together to determine the added perturbations.
arXiv Detail & Related papers (2023-06-02T03:11:32Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Quantifying and Understanding Adversarial Examples in Discrete Input
Spaces [70.18815080530801]
We formalize a notion of synonymous adversarial examples that applies in any discrete setting and describe a simple domain-agnostic algorithm to construct such examples.
Our work is a step towards a domain-agnostic treatment of discrete adversarial examples analogous to that of continuous inputs.
arXiv Detail & Related papers (2021-12-12T16:44:09Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Direction-Aggregated Attack for Transferable Adversarial Examples [10.208465711975242]
A deep neural network is vulnerable to adversarial examples crafted by imposing imperceptible changes to the inputs.
adversarial examples are most successful in white-box settings where the model and its parameters are available.
We propose the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples.
arXiv Detail & Related papers (2021-04-19T09:54:56Z) - Learning Defense Transformers for Counterattacking Adversarial Examples [43.59730044883175]
Deep neural networks (DNNs) are vulnerable to adversarial examples with small perturbations.
Existing defense methods focus on some specific types of adversarial examples and may fail to defend well in real-world applications.
We study adversarial examples from a new perspective that whether we can defend against adversarial examples by pulling them back to the original clean distribution.
arXiv Detail & Related papers (2021-03-13T02:03:53Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.