Related papers: Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps

Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps

URL: http://arxiv.org/abs/2208.11436v1
Date: Wed, 24 Aug 2022 11:05:04 GMT
Title: Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps
Authors: Mohammadreza Amirian, Friedhelm Schwenker and Thilo Stadelmann
Abstract summary: adversarial attacks on convolutional neural networks (CNN) In this work, we propose a novel detection method for adversarial examples to prevent attacks. We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy.
Score: 0.3437656066916039
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The existence of adversarial attacks on convolutional neural networks (CNN) questions the fitness of such models for serious applications. The attacks manipulate an input image such that misclassification is evoked while still looking normal to a human observer -- they are thus not easily detectable. In a different context, backpropagated activations of CNN hidden layers -- "feature responses" to a given input -- have been helpful to visualize for a human "debugger" what the CNN "looks at" while computing its output. In this work, we propose a novel detection method for adversarial examples to prevent attacks. We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy. The method does not alter the original network architecture and is fully human-interpretable. Experiments confirm the validity of our approach for state-of-the-art attacks on large-scale models trained on ImageNet.

Related papers

Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system. We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z)
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA) Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution. Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs [4.52308938611108]
We propose a method to detect adversarial and out-distribution examples against a pre-trained CNN. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.
arXiv Detail & Related papers (2020-11-18T07:10:13Z)
Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z)
Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
Miss the Point: Targeted Adversarial Attack on Multiple Landmark Detection [29.83857022733448]
This paper is the first to study how fragile a CNN-based model on multiple landmark detection to adversarial perturbations. We propose a novel Adaptive Targeted Iterative FGSM attack against the state-of-the-art models in multiple landmark detection.
arXiv Detail & Related papers (2020-07-10T07:58:35Z)
Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises [87.53808756910452]
A cooling-shrinking attack method is proposed to deceive state-of-the-art SiameseRPN-based trackers. Our method has good transferability and is able to deceive other top-performance trackers such as DaSiamRPN, DaSiamRPN-UpdateNet, and DiMP.
arXiv Detail & Related papers (2020-03-21T07:13:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.