Improving Adversarial Transferability via Neuron Attribution-Based
Attacks
- URL: http://arxiv.org/abs/2204.00008v1
- Date: Thu, 31 Mar 2022 13:47:30 GMT
- Title: Improving Adversarial Transferability via Neuron Attribution-Based
Attacks
- Authors: Jianping Zhang, Weibin Wu, Jen-tse Huang, Yizhan Huang, Wenxuan Wang,
Yuxin Su, Michael R. Lyu
- Abstract summary: We propose the Neuron-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations.
We derive an approximation scheme of neuron attribution to tremendously reduce the overhead.
Experiments confirm the superiority of our approach to the state-of-the-art benchmarks.
- Score: 35.02147088207232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial
examples. It is thus imperative to devise effective attack algorithms to
identify the deficiencies of DNNs beforehand in security-sensitive
applications. To efficiently tackle the black-box setting where the target
model's particulars are unknown, feature-level transfer-based attacks propose
to contaminate the intermediate feature outputs of local models, and then
directly employ the crafted adversarial samples to attack the target model. Due
to the transferability of features, feature-level attacks have shown promise in
synthesizing more transferable adversarial samples. However, existing
feature-level attacks generally employ inaccurate neuron importance
estimations, which deteriorates their transferability. To overcome such
pitfalls, in this paper, we propose the Neuron Attribution-based Attack (NAA),
which conducts feature-level attacks with more accurate neuron importance
estimations. Specifically, we first completely attribute a model's output to
each neuron in a middle layer. We then derive an approximation scheme of neuron
attribution to tremendously reduce the computation overhead. Finally, we weight
neurons based on their attribution results and launch feature-level attacks.
Extensive experiments confirm the superiority of our approach to the
state-of-the-art benchmarks.
Related papers
- Rethinking Targeted Adversarial Attacks For Neural Machine Translation [56.10484905098989]
This paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results.
Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples.
Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems.
arXiv Detail & Related papers (2024-07-07T10:16:06Z) - DANAA: Towards transferable attacks with double adversarial neuron
attribution [37.33924432015966]
We propose a double adversarial neuron attribution attack method, termed DANAA', to obtain more accurate feature importance estimation.
The goal is to measure the weight of individual neurons and retain the features that are more important towards transferability.
arXiv Detail & Related papers (2023-10-16T14:11:32Z) - Boosting Adversarial Transferability via Fusing Logits of Top-1
Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method.
Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z) - Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task.
Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero.
Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z) - Adversarial Robustness in Deep Learning: Attacks on Fragile Neurons [0.6899744489931016]
We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer.
We correlate these neurons with the distribution of adversarial attacks on the network.
arXiv Detail & Related papers (2022-01-31T14:34:07Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.