Towards Black-box Adversarial Example Detection: A Data
Reconstruction-based Method
- URL: http://arxiv.org/abs/2306.02021v1
- Date: Sat, 3 Jun 2023 06:34:17 GMT
- Title: Towards Black-box Adversarial Example Detection: A Data
Reconstruction-based Method
- Authors: Yifei Gao, Zhiyu Lin, Yunfan Yang, Jitao Sang
- Abstract summary: Black-box attack is a more realistic threat and has led to various black-box adversarial training-based defense methods.
To tackle the BAD problem, we propose a data reconstruction-based adversarial example detection method.
- Score: 9.857570123016213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial example detection is known to be an effective adversarial defense
method. Black-box attack, which is a more realistic threat and has led to
various black-box adversarial training-based defense methods, however, does not
attract considerable attention in adversarial example detection. In this paper,
we fill this gap by positioning the problem of black-box adversarial example
detection (BAD). Data analysis under the introduced BAD settings demonstrates
(1) the incapability of existing detectors in addressing the black-box scenario
and (2) the potential of exploring BAD solutions from a data perspective. To
tackle the BAD problem, we propose a data reconstruction-based adversarial
example detection method. Specifically, we use variational auto-encoder (VAE)
to capture both pixel and frequency representations of normal examples. Then we
use reconstruction error to detect adversarial examples. Compared with existing
detection methods, the proposed method achieves substantially better detection
performance in BAD, which helps promote the deployment of adversarial example
detection-based defense solutions in real-world models.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation [38.55694348512267]
We propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA)
Specifically, our approach identifies the Principal Adversarial Domains (PADs)
Then, we pioneer to exploit multi-source domain adaptation in adversarial example detection with PADs as source domains.
arXiv Detail & Related papers (2024-04-19T05:32:37Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Adversarial Examples Detection with Enhanced Image Difference Features
based on Local Histogram Equalization [20.132066800052712]
We propose an adversarial example detection framework based on a high-frequency information enhancement strategy.
This framework can effectively extract and amplify the feature differences between adversarial examples and normal examples.
arXiv Detail & Related papers (2023-05-08T03:14:01Z) - ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - TREATED:Towards Universal Defense against Textual Adversarial Attacks [28.454310179377302]
We propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions.
Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.
arXiv Detail & Related papers (2021-09-13T03:31:20Z) - Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence.
We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z) - Beating Attackers At Their Own Games: Adversarial Example Detection
Using Adversarial Gradient Directions [16.993439721743478]
The proposed method is based on the observation that the directions of adversarial gradients play a key role in characterizing the adversarial space.
Experiments conducted on two different databases, CIFAR-10 and ImageNet, show that the proposed detection method achieves 97.9% and 98.6% AUC-ROC on five different adversarial attacks.
arXiv Detail & Related papers (2020-12-31T01:12:24Z) - FADER: Fast Adversarial Example Rejection [19.305796826768425]
Recent defenses have been shown to improve adversarial robustness by detecting anomalous deviations from legitimate training samples at different layer representations.
We introduce FADER, a novel technique for speeding up detection-based methods.
Our experiments outline up to 73x prototypes reduction compared to analyzed detectors for MNIST dataset and up to 50x for CIFAR10 respectively.
arXiv Detail & Related papers (2020-10-18T22:00:11Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.