MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples
Detectors
- URL: http://arxiv.org/abs/2206.15415v1
- Date: Thu, 30 Jun 2022 17:05:45 GMT
- Title: MEAD: A Multi-Armed Approach for Evaluation of Adversarial Examples
Detectors
- Authors: Federica Granese, Marine Picot, Marco Romanelli, Francisco Messina,
Pablo Piantanida
- Abstract summary: We propose a novel framework, called MEAD, for evaluating detectors based on several attack strategies.
Among them, we make use of three new objectives to generate attacks.
The proposed performance metric is based on the worst-case scenario.
- Score: 24.296350262025552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detection of adversarial examples has been a hot topic in the last years due
to its importance for safely deploying machine learning algorithms in critical
applications. However, the detection methods are generally validated by
assuming a single implicitly known attack strategy, which does not necessarily
account for real-life threats. Indeed, this can lead to an overoptimistic
assessment of the detectors' performance and may induce some bias in the
comparison between competing detection schemes. We propose a novel multi-armed
framework, called MEAD, for evaluating detectors based on several attack
strategies to overcome this limitation. Among them, we make use of three new
objectives to generate attacks. The proposed performance metric is based on the
worst-case scenario: detection is successful if and only if all different
attacks are correctly recognized. Empirically, we show the effectiveness of our
approach. Moreover, the poor performance obtained for state-of-the-art
detectors opens a new exciting line of research.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - When Measures are Unreliable: Imperceptible Adversarial Perturbations
toward Top-$k$ Multi-Label Learning [83.8758881342346]
A novel loss function is devised to generate adversarial perturbations that could achieve both visual and measure imperceptibility.
Experiments on large-scale benchmark datasets demonstrate the superiority of our proposed method in attacking the top-$k$ multi-label systems.
arXiv Detail & Related papers (2023-07-27T13:18:47Z) - A Minimax Approach Against Multi-Armed Adversarial Attacks Detection [31.971443221041174]
Multi-armed adversarial attacks have been shown to be highly successful in fooling state-of-the-art detectors.
We propose a solution that aggregates the soft-probability outputs of multiple pre-trained detectors according to a minimax approach.
We show that our aggregation consistently outperforms individual state-of-the-art detectors against multi-armed adversarial attacks.
arXiv Detail & Related papers (2023-02-04T18:21:22Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples.
We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z) - Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples.
We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z) - Random Projections for Adversarial Attack Detection [8.684378639046644]
adversarial attack detection remains a fundamentally challenging problem from two perspectives.
We present a technique that makes use of special properties of random projections, whereby we can characterize the behavior of clean and adversarial examples.
Performance evaluation demonstrates that our technique outperforms ($>0.92$ AUC) competing state of the art (SOTA) attack strategies.
arXiv Detail & Related papers (2020-12-11T15:02:28Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Unknown Presentation Attack Detection against Rational Attackers [6.351869353952288]
Presentation attack detection and multimedia forensics are still vulnerable to attacks in real-life settings.
Some of the challenges for existing solutions are the detection of unknown attacks, the ability to perform in adversarial settings, few-shot learning, and explainability.
New optimization criterion is proposed and a set of requirements are defined for improving the performance of these systems in real-life settings.
arXiv Detail & Related papers (2020-10-04T14:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.