Random Projections for Adversarial Attack Detection
- URL: http://arxiv.org/abs/2012.06405v1
- Date: Fri, 11 Dec 2020 15:02:28 GMT
- Title: Random Projections for Adversarial Attack Detection
- Authors: Nathan Drenkow, Neil Fendley, Philippe Burlina
- Abstract summary: adversarial attack detection remains a fundamentally challenging problem from two perspectives.
We present a technique that makes use of special properties of random projections, whereby we can characterize the behavior of clean and adversarial examples.
Performance evaluation demonstrates that our technique outperforms ($>0.92$ AUC) competing state of the art (SOTA) attack strategies.
- Score: 8.684378639046644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whilst adversarial attack detection has received considerable attention, it
remains a fundamentally challenging problem from two perspectives. First, while
threat models can be well-defined, attacker strategies may still vary widely
within those constraints. Therefore, detection should be considered as an
open-set problem, standing in contrast to most current detection strategies.
These methods take a closed-set view and train binary detectors, thus biasing
detection toward attacks seen during detector training. Second, information is
limited at test time and confounded by nuisance factors including the label and
underlying content of the image. Many of the current high-performing techniques
use training sets for dealing with some of these issues, but can be limited by
the overall size and diversity of those sets during the detection step. We
address these challenges via a novel strategy based on random subspace
analysis. We present a technique that makes use of special properties of random
projections, whereby we can characterize the behavior of clean and adversarial
examples across a diverse set of subspaces. We then leverage the
self-consistency (or inconsistency) of model activations to discern clean from
adversarial examples. Performance evaluation demonstrates that our technique
outperforms ($>0.92$ AUC) competing state of the art (SOTA) attack strategies,
while remaining truly agnostic to the attack method itself. It also requires
significantly less training data, composed only of clean examples, when
compared to competing SOTA methods, which achieve only chance performance, when
evaluated in a more rigorous testing scenario.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - The Adversarial Implications of Variable-Time Inference [47.44631666803983]
We present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack.
We investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors.
We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference.
arXiv Detail & Related papers (2023-09-05T11:53:17Z) - Uncertainty-based Detection of Adversarial Attacks in Semantic
Segmentation [16.109860499330562]
We introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation.
We demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
arXiv Detail & Related papers (2023-05-22T08:36:35Z) - MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples
Detectors [24.296350262025552]
We propose a novel framework, called MEAD, for evaluating detectors based on several attack strategies.
Among them, we make use of three new objectives to generate attacks.
The proposed performance metric is based on the worst-case scenario.
arXiv Detail & Related papers (2022-06-30T17:05:45Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.