Related papers: NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks

NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks

URL: http://arxiv.org/abs/2112.13060v1
Date: Fri, 24 Dec 2021 13:37:42 GMT
Title: NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
Authors: Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Yue Yu and Shouling Ji
Abstract summary: We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks.
Score: 22.668518687143244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep learning models have achieved unprecedented success, their vulnerabilities towards adversarial attacks have attracted increasing attention, especially when deployed in security-critical domains. To address the challenge, numerous defense strategies, including reactive and proactive ones, have been proposed for robustness improvement. From the perspective of image feature space, some of them cannot reach satisfying results due to the shift of features. Besides, features learned by models are not directly related to classification results. Different from them, We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. We observed that attacks mislead the model by dramatically changing the neurons that contribute most and least to the correct label. Motivated by it, we introduce the concept of neuron influence and further divide neurons into front, middle and tail part. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks. By strengthening front neurons and weakening those in the tail part, NIP can eliminate nearly all adversarial perturbations while still maintaining high benign accuracy. Besides, it can cope with different sizes of perturbations via adaptivity, especially larger ones. Comprehensive experiments conducted on three datasets and six models show that NIP outperforms the state-of-the-art baselines against eleven adversarial attacks. We further provide interpretable proofs via neuron activation and visualization for better understanding.

Related papers

Neural cyberattacks applied to the vision under realistic visual stimuli [3.0748861313823]
Brain-Computer Interfaces (BCIs) are systems traditionally used in medicine and designed to interact with the brain to record or stimulate neurons. Previous work validated neural cyberattacks able to disrupt spontaneous neural activity by performing neural overstimulation or inhibition. This work analyzed the impact of two existing neural attacks, Neuronal Flooding (FLO) and Neuronal Jamming (JAM) on a complex neuronal topology of mice.
arXiv Detail & Related papers (2025-03-11T10:58:58Z)
DANAA: Towards transferable attacks with double adversarial neuron attribution [37.33924432015966]
We propose a double adversarial neuron attribution attack method, termed DANAA', to obtain more accurate feature importance estimation. The goal is to measure the weight of individual neurons and retain the features that are more important towards transferability.
arXiv Detail & Related papers (2023-10-16T14:11:32Z)
Investigating Human-Identifiable Features Hidden in Adversarial Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets. We identify human-identifiable features in adversarial perturbations. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z)
Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks [28.081328051535618]
Adversarial attacks on a convolutional neural network (CNN) could fool a high-performance CNN into making incorrect predictions. Our work introduces a visual analytics approach to understanding adversarial attacks. A visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks.
arXiv Detail & Related papers (2023-03-06T01:01:56Z)
Adversarial Defense via Neural Oscillation inspired Gradient Masking [0.0]
Spiking neural networks (SNNs) attract great attention due to their low power consumption, low latency, and biological plausibility. We propose a novel neural model that incorporates the bio-inspired oscillation mechanism to enhance the security of SNNs.
arXiv Detail & Related papers (2022-11-04T02:13:19Z)
Improving Adversarial Transferability via Neuron Attribution-Based Attacks [35.02147088207232]
We propose the Neuron-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations. We derive an approximation scheme of neuron attribution to tremendously reduce the overhead. Experiments confirm the superiority of our approach to the state-of-the-art benchmarks.
arXiv Detail & Related papers (2022-03-31T13:47:30Z)
Adversarial Robustness in Deep Learning: Attacks on Fragile Neurons [0.6899744489931016]
We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer. We correlate these neurons with the distribution of adversarial attacks on the network.
arXiv Detail & Related papers (2022-01-31T14:34:07Z)
Few-shot Backdoor Defense Using Shapley Estimation [123.56934991060788]
We develop a new approach called Shapley Pruning to mitigate backdoor attacks on deep neural networks. ShapPruning identifies the few infected neurons (under 1% of all neurons) and manages to protect the model's structure and accuracy. Experiments demonstrate the effectiveness and robustness of our method against various attacks and tasks.
arXiv Detail & Related papers (2021-12-30T02:27:03Z)
Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space. We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space. Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z)
And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks. We define AND-like neurons and propose measures to increase their proportion in the network. Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z)
Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior. Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z)
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [58.47179090632039]
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception. Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
arXiv Detail & Related papers (2020-06-22T22:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.