DANAA: Towards transferable attacks with double adversarial neuron
attribution
- URL: http://arxiv.org/abs/2310.10427v2
- Date: Sun, 22 Oct 2023 16:06:00 GMT
- Title: DANAA: Towards transferable attacks with double adversarial neuron
attribution
- Authors: Zhibo Jin, Zhiyu Zhu, Xinyi Wang, Jiayu Zhang, Jun Shen, Huaming Chen
- Abstract summary: We propose a double adversarial neuron attribution attack method, termed DANAA', to obtain more accurate feature importance estimation.
The goal is to measure the weight of individual neurons and retain the features that are more important towards transferability.
- Score: 37.33924432015966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep neural networks have excellent results in many fields, they are
susceptible to interference from attacking samples resulting in erroneous
judgments. Feature-level attacks are one of the effective attack types, which
targets the learnt features in the hidden layers to improve its transferability
across different models. Yet it is observed that the transferability has been
largely impacted by the neuron importance estimation results. In this paper, a
double adversarial neuron attribution attack method, termed `DANAA', is
proposed to obtain more accurate feature importance estimation. In our method,
the model outputs are attributed to the middle layer based on an adversarial
non-linear path. The goal is to measure the weight of individual neurons and
retain the features that are more important towards transferability. We have
conducted extensive experiments on the benchmark datasets to demonstrate the
state-of-the-art performance of our method. Our code is available at:
https://github.com/Davidjinzb/DANAA
Related papers
- Boosting Adversarial Transferability via Fusing Logits of Top-1
Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method.
Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - Improving Adversarial Transferability via Neuron Attribution-Based
Attacks [35.02147088207232]
We propose the Neuron-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations.
We derive an approximation scheme of neuron attribution to tremendously reduce the overhead.
Experiments confirm the superiority of our approach to the state-of-the-art benchmarks.
arXiv Detail & Related papers (2022-03-31T13:47:30Z) - Unreasonable Effectiveness of Last Hidden Layer Activations [0.5156484100374058]
We show that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted attack cases.
We've experimentally verified the efficacy of our approach on MNIST (Digit), CIFAR10 datasets.
arXiv Detail & Related papers (2022-02-15T12:02:59Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - One Neuron to Fool Them All [12.107259467873094]
We evaluate the sensitivity of individual neurons in terms of how robust the model's output is to direct perturbations of that neuron's output.
Attacks using a loss function that targets just a single sensitive neuron find adversarial examples nearly as effectively as ones that target the full model.
arXiv Detail & Related papers (2020-03-20T16:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.