Adversarially Robust One-class Novelty Detection
- URL: http://arxiv.org/abs/2108.11168v1
- Date: Wed, 25 Aug 2021 10:41:29 GMT
- Title: Adversarially Robust One-class Novelty Detection
- Authors: Shao-Yuan Lo, Poojan Oza, Vishal M. Patel
- Abstract summary: We show that existing novelty detectors are susceptible to adversarial examples.
We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
- Score: 83.1570537254877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-class novelty detectors are trained with examples of a particular class
and are tasked with identifying whether a query example belongs to the same
known class. Most recent advances adopt a deep auto-encoder style architecture
to compute novelty scores for detecting novel class data. Deep networks have
shown to be vulnerable to adversarial attacks, yet little focus is devoted to
studying the adversarial robustness of deep novelty detectors. In this paper,
we first show that existing novelty detectors are susceptible to adversarial
examples. We further demonstrate that commonly-used defense approaches for
classification tasks have limited effectiveness in one-class novelty detection.
Hence, we need a defense specifically designed for novelty detection. To this
end, we propose a defense strategy that manipulates the latent space of novelty
detectors to improve the robustness against adversarial examples. The proposed
method, referred to as Principal Latent Space (PLS), learns the
incrementally-trained cascade principal components in the latent space to
robustify novelty detectors. PLS can purify latent space against adversarial
examples and constrain latent space to exclusively model the known class
distribution. We conduct extensive experiments on multiple attacks, datasets
and novelty detectors, showing that PLS consistently enhances the adversarial
robustness of novelty detection models.
Related papers
- HOLMES: to Detect Adversarial Examples with Multiple Detectors [1.455585466338228]
HOLMES is able to distinguish textitunseen adversarial examples from multiple attacks with high accuracy and low false positive rates.
Our effective and inexpensive strategies neither modify original DNN models nor require its internal parameters.
arXiv Detail & Related papers (2024-05-30T11:22:55Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - TextShield: Beyond Successfully Detecting Adversarial Sentences in Text
Classification [6.781100829062443]
Adversarial attack serves as a major challenge for neural network models in NLP, which precludes the model's deployment in safety-critical applications.
Previous detection methods are incapable of giving correct predictions on adversarial sentences.
We propose a saliency-based detector, which can effectively detect whether an input sentence is adversarial or not.
arXiv Detail & Related papers (2023-02-03T22:58:07Z) - Adversarial Detector with Robust Classifier [14.586106862913553]
We propose a novel adversarial detector, which consists of a robust classifier and a plain one, to highly detect adversarial examples.
In an experiment, the proposed detector is demonstrated to outperform a state-of-the-art detector without any robust classifier.
arXiv Detail & Related papers (2022-02-05T07:21:05Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples.
We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Any-Shot Object Detection [81.88153407655334]
'Any-shot detection' is where totally unseen and few-shot categories can simultaneously co-occur during inference.
We propose a unified any-shot detection model, that can concurrently learn to detect both zero-shot and few-shot object classes.
Our framework can also be used solely for Zero-shot detection and Few-shot detection tasks.
arXiv Detail & Related papers (2020-03-16T03:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.