Related papers: Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks

Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks

URL: http://arxiv.org/abs/2002.09772v1
Date: Sat, 22 Feb 2020 21:13:00 GMT
Title: Non-Intrusive Detection of Adversarial Deep Learning Attacks via Observer Networks
Authors: Kirthi Shankar Sivamani, Rajeev Sahay, Aly El Gamal
Abstract summary: Recent studies have shown that deep learning models are vulnerable to crafted adversarial inputs. We propose a novel method to detect adversarial inputs by augmenting the main classification network with multiple binary detectors. We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset.
Score: 5.4572790062292125
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasi-imperceptible to humans. In this letter, we propose a novel method to detect adversarial inputs, by augmenting the main classification network with multiple binary detectors (observer networks) which take inputs from the hidden layers of the original network (convolutional kernel outputs) and classify the input as clean or adversarial. During inference, the detectors are treated as a part of an ensemble network and the input is deemed adversarial if at least half of the detectors classify it as so. The proposed method addresses the trade-off between accuracy of classification on clean and adversarial samples, as the original classification network is not modified during the detection process. The use of multiple observer networks makes attacking the detection mechanism non-trivial even when the attacker is aware of the victim classifier. We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR-10 dataset using the Fast Gradient Sign Attack in a semi-white box setup. The number of false positive detections is a mere 0.12% in the worst case scenario.

Related papers

SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds [10.97544978626829]
SpaNN is an attack detector whose computational complexity is independent of the expected number of adversarial patches.<n>SpaNN does not rely on a fixed saliency threshold for identifying adversarial regions.<n>Our results show that SpaNN outperforms state-of-the-art defenses by up to 11 and 27 percentage points in the case of object detection and the case of image classification.
arXiv Detail & Related papers (2025-06-23T12:51:10Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z)
Deep Neural Networks based Meta-Learning for Network Intrusion Detection [0.24466725954625884]
digitization of different components of industry and inter-connectivity among indigenous networks have increased the risk of network attacks. Data used to construct a predictive model for computer networks has a skewed class distribution and limited representation of attack types. We propose a novel deep neural network based Meta-Learning framework; INformation FUsion and Stacking Ensemble (INFUSE) for network intrusion detection.
arXiv Detail & Related papers (2023-02-18T18:00:05Z)
DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly Detection [0.0]
Machine Learning approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs) Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks. This paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples.
arXiv Detail & Related papers (2022-12-15T00:08:05Z)
Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system. We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z)
AntidoteRT: Run-time Detection and Correction of Poison Attacks on Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks. We propose lightweight automated detection and correction techniques against poisoning attacks. Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z)
Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z)
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA) Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution. Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z)
Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.