Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps
- URL: http://arxiv.org/abs/2208.11436v1
- Date: Wed, 24 Aug 2022 11:05:04 GMT
- Title: Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps
- Authors: Mohammadreza Amirian, Friedhelm Schwenker and Thilo Stadelmann
- Abstract summary: adversarial attacks on convolutional neural networks (CNN)
In this work, we propose a novel detection method for adversarial examples to prevent attacks.
We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy.
- Score: 0.3437656066916039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existence of adversarial attacks on convolutional neural networks (CNN)
questions the fitness of such models for serious applications. The attacks
manipulate an input image such that misclassification is evoked while still
looking normal to a human observer -- they are thus not easily detectable. In a
different context, backpropagated activations of CNN hidden layers -- "feature
responses" to a given input -- have been helpful to visualize for a human
"debugger" what the CNN "looks at" while computing its output. In this work, we
propose a novel detection method for adversarial examples to prevent attacks.
We do so by tracking adversarial perturbations in feature responses, allowing
for automatic detection using average local spatial entropy. The method does
not alter the original network architecture and is fully human-interpretable.
Experiments confirm the validity of our approach for state-of-the-art attacks
on large-scale models trained on ImageNet.
Related papers
- Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial
Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks.
Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system.
We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples
in Pre-trained CNNs [4.52308938611108]
We propose a method to detect adversarial and out-distribution examples against a pre-trained CNN.
To this end, we create adversarial profiles for each class using only one adversarial attack generation technique.
Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.
arXiv Detail & Related papers (2020-11-18T07:10:13Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z) - Miss the Point: Targeted Adversarial Attack on Multiple Landmark
Detection [29.83857022733448]
This paper is the first to study how fragile a CNN-based model on multiple landmark detection to adversarial perturbations.
We propose a novel Adaptive Targeted Iterative FGSM attack against the state-of-the-art models in multiple landmark detection.
arXiv Detail & Related papers (2020-07-10T07:58:35Z) - Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises [87.53808756910452]
A cooling-shrinking attack method is proposed to deceive state-of-the-art SiameseRPN-based trackers.
Our method has good transferability and is able to deceive other top-performance trackers such as DaSiamRPN, DaSiamRPN-UpdateNet, and DiMP.
arXiv Detail & Related papers (2020-03-21T07:13:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.