Towards Interpretable Adversarial Examples via Sparse Adversarial Attack
- URL: http://arxiv.org/abs/2506.17250v1
- Date: Sun, 08 Jun 2025 09:13:30 GMT
- Title: Towards Interpretable Adversarial Examples via Sparse Adversarial Attack
- Authors: Fudong Lin, Jiadong Lou, Hao Wang, Brian Jalaian, Xu Yuan,
- Abstract summary: Sparse attacks are to optimize the magnitude of adversarial perturbations for fooling deep neural networks (DNNs)<n>Existing solutions fail to yield interpretable adversarial examples due to their poor sparsity.<n>In this paper, we aim to develop a sparse attack for understanding the vulnerability of CNNs by minimizing the magnitude of initial perturbations.
- Score: 22.588476144401977
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sparse attacks are to optimize the magnitude of adversarial perturbations for fooling deep neural networks (DNNs) involving only a few perturbed pixels (i.e., under the l0 constraint), suitable for interpreting the vulnerability of DNNs. However, existing solutions fail to yield interpretable adversarial examples due to their poor sparsity. Worse still, they often struggle with heavy computational overhead, poor transferability, and weak attack strength. In this paper, we aim to develop a sparse attack for understanding the vulnerability of CNNs by minimizing the magnitude of initial perturbations under the l0 constraint, to overcome the existing drawbacks while achieving a fast, transferable, and strong attack to DNNs. In particular, a novel and theoretical sound parameterization technique is introduced to approximate the NP-hard l0 optimization problem, making directly optimizing sparse perturbations computationally feasible. Besides, a novel loss function is designed to augment initial perturbations by maximizing the adversary property and minimizing the number of perturbed pixels simultaneously. Extensive experiments are conducted to demonstrate that our approach, with theoretical performance guarantees, outperforms state-of-the-art sparse attacks in terms of computational overhead, transferability, and attack strength, expecting to serve as a benchmark for evaluating the robustness of DNNs. In addition, theoretical and empirical results validate that our approach yields sparser adversarial examples, empowering us to discover two categories of noises, i.e., "obscuring noise" and "leading noise", which will help interpret how adversarial perturbation misleads the classifiers into incorrect predictions. Our code is available at https://github.com/fudong03/SparseAttack.
Related papers
- Evaluating Model Robustness Using Adaptive Sparse L0 Regularization [5.772716337390152]
adversarial examples challenge existing defenses by altering a minimal subset of features.
Current L0 norm attack methodologies face a trade off between accuracy and efficiency.
This paper proposes a novel, scalable, and effective approach to generate adversarial examples based on the L0 norm.
arXiv Detail & Related papers (2024-08-28T11:02:23Z) - STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario [50.37501379058119]
We propose the Spatial Transform Black-box Attack (STBA) to craft formidable adversarial examples in the query-limited scenario.
We show that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
arXiv Detail & Related papers (2024-03-30T13:28:53Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - Block-Sparse Adversarial Attack to Fool Transformer-Based Text
Classifiers [49.50163349643615]
In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers.
Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5%.
arXiv Detail & Related papers (2022-03-11T14:37:41Z) - Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - Generating Adversarial Examples with Graph Neural Networks [26.74003742013481]
We propose a novel attack based on a graph neural network (GNN) that takes advantage of the strengths of both approaches.
We show that our method beats state-of-the-art adversarial attacks, including PGD-attack, MI-FGSM, and Carlini and Wagner attack.
We provide a new challenging dataset specifically designed to allow for a more illustrative comparison of adversarial attacks.
arXiv Detail & Related papers (2021-05-30T22:46:41Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Network Moments: Extensions and Sparse-Smooth Attacks [59.24080620535988]
We derive exact analytic expressions for the first and second moments of a small piecewise linear (PL) network (Affine, ReLU, Affine) subject to Gaussian input.
We show that the new variance expression can be efficiently approximated leading to much tighter variance estimates.
arXiv Detail & Related papers (2020-06-21T11:36:41Z) - On the Matrix-Free Generation of Adversarial Perturbations for Black-Box
Attacks [1.199955563466263]
In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks.
The attackers generate such perturbation without invoking inner functions and/or accessing the inner states of a deep neural network.
arXiv Detail & Related papers (2020-02-18T00:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.