NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
- URL: http://arxiv.org/abs/2112.13060v1
- Date: Fri, 24 Dec 2021 13:37:42 GMT
- Title: NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
- Authors: Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Yue Yu and Shouling
Ji
- Abstract summary: We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks.
Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks.
- Score: 22.668518687143244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning models have achieved unprecedented success, their
vulnerabilities towards adversarial attacks have attracted increasing
attention, especially when deployed in security-critical domains. To address
the challenge, numerous defense strategies, including reactive and proactive
ones, have been proposed for robustness improvement. From the perspective of
image feature space, some of them cannot reach satisfying results due to the
shift of features. Besides, features learned by models are not directly related
to classification results. Different from them, We consider defense method
essentially from model inside and investigated the neuron behaviors before and
after attacks. We observed that attacks mislead the model by dramatically
changing the neurons that contribute most and least to the correct label.
Motivated by it, we introduce the concept of neuron influence and further
divide neurons into front, middle and tail part. Based on it, we propose
neuron-level inverse perturbation(NIP), the first neuron-level reactive defense
method against adversarial attacks. By strengthening front neurons and
weakening those in the tail part, NIP can eliminate nearly all adversarial
perturbations while still maintaining high benign accuracy. Besides, it can
cope with different sizes of perturbations via adaptivity, especially larger
ones. Comprehensive experiments conducted on three datasets and six models show
that NIP outperforms the state-of-the-art baselines against eleven adversarial
attacks. We further provide interpretable proofs via neuron activation and
visualization for better understanding.
Related papers
- DANAA: Towards transferable attacks with double adversarial neuron
attribution [37.33924432015966]
We propose a double adversarial neuron attribution attack method, termed DANAA', to obtain more accurate feature importance estimation.
The goal is to measure the weight of individual neurons and retain the features that are more important towards transferability.
arXiv Detail & Related papers (2023-10-16T14:11:32Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Visual Analytics of Neuron Vulnerability to Adversarial Attacks on
Convolutional Neural Networks [28.081328051535618]
Adversarial attacks on a convolutional neural network (CNN) could fool a high-performance CNN into making incorrect predictions.
Our work introduces a visual analytics approach to understanding adversarial attacks.
A visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks.
arXiv Detail & Related papers (2023-03-06T01:01:56Z) - Adversarial Defense via Neural Oscillation inspired Gradient Masking [0.0]
Spiking neural networks (SNNs) attract great attention due to their low power consumption, low latency, and biological plausibility.
We propose a novel neural model that incorporates the bio-inspired oscillation mechanism to enhance the security of SNNs.
arXiv Detail & Related papers (2022-11-04T02:13:19Z) - Improving Adversarial Transferability via Neuron Attribution-Based
Attacks [35.02147088207232]
We propose the Neuron-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations.
We derive an approximation scheme of neuron attribution to tremendously reduce the overhead.
Experiments confirm the superiority of our approach to the state-of-the-art benchmarks.
arXiv Detail & Related papers (2022-03-31T13:47:30Z) - Adversarial Robustness in Deep Learning: Attacks on Fragile Neurons [0.6899744489931016]
We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer.
We correlate these neurons with the distribution of adversarial attacks on the network.
arXiv Detail & Related papers (2022-01-31T14:34:07Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.
Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z) - Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [58.47179090632039]
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception.
Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses.
Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
arXiv Detail & Related papers (2020-06-22T22:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.