Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
- URL: http://arxiv.org/abs/2112.13060v3
- Date: Mon, 19 Aug 2024 17:21:36 GMT
- Title: Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
- Authors: Ruoxi Chen, Haibo Jin, Haibin Zheng, Jinyin Chen, Zhenguang Liu,
- Abstract summary: We propose emphNeuron-level Inverse Perturbation (NIP), a novel defense against general adversarial attacks.
It calculates neuron influence from benign examples and then modifies input examples by generating inverse perturbations.
- Score: 14.817015950058915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vulnerabilities of deep learning models towards adversarial attacks have attracted increasing attention, especially when models are deployed in security-critical domains. Numerous defense methods, including reactive and proactive ones, have been proposed for model robustness improvement. Reactive defenses, such as conducting transformations to remove perturbations, usually fail to handle large perturbations. The proactive defenses that involve retraining, suffer from the attack dependency and high computation cost. In this paper, we consider defense methods from the general effect of adversarial attacks that take on neurons inside the model. We introduce the concept of neuron influence, which can quantitatively measure neurons' contribution to correct classification. Then, we observe that almost all attacks fool the model by suppressing neurons with larger influence and enhancing those with smaller influence. Based on this, we propose \emph{Neuron-level Inverse Perturbation} (NIP), a novel defense against general adversarial attacks. It calculates neuron influence from benign examples and then modifies input examples by generating inverse perturbations that can in turn strengthen neurons with larger influence and weaken those with smaller influence.
Related papers
- Neural cyberattacks applied to the vision under realistic visual stimuli [3.0748861313823]
Brain-Computer Interfaces (BCIs) are systems traditionally used in medicine and designed to interact with the brain to record or stimulate neurons.
Previous work validated neural cyberattacks able to disrupt spontaneous neural activity by performing neural overstimulation or inhibition.
This work analyzed the impact of two existing neural attacks, Neuronal Flooding (FLO) and Neuronal Jamming (JAM) on a complex neuronal topology of mice.
arXiv Detail & Related papers (2025-03-11T10:58:58Z) - DANAA: Towards transferable attacks with double adversarial neuron
attribution [37.33924432015966]
We propose a double adversarial neuron attribution attack method, termed DANAA', to obtain more accurate feature importance estimation.
The goal is to measure the weight of individual neurons and retain the features that are more important towards transferability.
arXiv Detail & Related papers (2023-10-16T14:11:32Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Visual Analytics of Neuron Vulnerability to Adversarial Attacks on
Convolutional Neural Networks [28.081328051535618]
Adversarial attacks on a convolutional neural network (CNN) could fool a high-performance CNN into making incorrect predictions.
Our work introduces a visual analytics approach to understanding adversarial attacks.
A visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks.
arXiv Detail & Related papers (2023-03-06T01:01:56Z) - Adversarial Defense via Neural Oscillation inspired Gradient Masking [0.0]
Spiking neural networks (SNNs) attract great attention due to their low power consumption, low latency, and biological plausibility.
We propose a novel neural model that incorporates the bio-inspired oscillation mechanism to enhance the security of SNNs.
arXiv Detail & Related papers (2022-11-04T02:13:19Z) - Improving Adversarial Transferability via Neuron Attribution-Based
Attacks [35.02147088207232]
We propose the Neuron-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations.
We derive an approximation scheme of neuron attribution to tremendously reduce the overhead.
Experiments confirm the superiority of our approach to the state-of-the-art benchmarks.
arXiv Detail & Related papers (2022-03-31T13:47:30Z) - Adversarial Robustness in Deep Learning: Attacks on Fragile Neurons [0.6899744489931016]
We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer.
We correlate these neurons with the distribution of adversarial attacks on the network.
arXiv Detail & Related papers (2022-01-31T14:34:07Z) - Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks.
ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy.
Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z) - Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space.
We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space.
Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.
Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z) - Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [58.47179090632039]
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception.
Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses.
Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
arXiv Detail & Related papers (2020-06-22T22:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.