Output Randomization: A Novel Defense for both White-box and Black-box
Adversarial Models
- URL: http://arxiv.org/abs/2107.03806v1
- Date: Thu, 8 Jul 2021 12:27:19 GMT
- Title: Output Randomization: A Novel Defense for both White-box and Black-box
Adversarial Models
- Authors: Daniel Park, Haidar Khan, Azer Khan, Alex Gittens, B\"ulent Yener
- Abstract summary: Adversarial examples pose a threat to deep neural network models in a variety of scenarios.
We explore the use of output randomization as a defense against attacks in both the black box and white box models.
- Score: 8.189696720657247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial examples pose a threat to deep neural network models in a variety
of scenarios, from settings where the adversary has complete knowledge of the
model in a "white box" setting and to the opposite in a "black box" setting. In
this paper, we explore the use of output randomization as a defense against
attacks in both the black box and white box models and propose two defenses. In
the first defense, we propose output randomization at test time to thwart
finite difference attacks in black box settings. Since this type of attack
relies on repeated queries to the model to estimate gradients, we investigate
the use of randomization to thwart such adversaries from successfully creating
adversarial examples. We empirically show that this defense can limit the
success rate of a black box adversary using the Zeroth Order Optimization
attack to 0%. Secondly, we propose output randomization training as a defense
against white box adversaries. Unlike prior approaches that use randomization,
our defense does not require its use at test time, eliminating the Backward
Pass Differentiable Approximation attack, which was shown to be effective
against other randomization defenses. Additionally, this defense has low
overhead and is easily implemented, allowing it to be used together with other
defenses across various model architectures. We evaluate output randomization
training against the Projected Gradient Descent attacker and show that the
defense can reduce the PGD attack's success rate down to 12% when using
cross-entropy loss.
Related papers
- Counter-Samples: A Stateless Strategy to Neutralize Black Box Adversarial Attacks [2.9815109163161204]
Our paper presents a novel defence against black box attacks, where attackers use the victim model as an oracle to craft their adversarial examples.
Unlike traditional preprocessing defences that rely on sanitizing input samples, our strategy counters the attack process itself.
We demonstrate that our approach is remarkably effective against state-of-the-art black box attacks and outperforms existing defences for both the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2024-03-14T10:59:54Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Query Efficient Cross-Dataset Transferable Black-Box Attack on Action
Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems.
We propose a new attack on action recognition that addresses these shortcomings by generating perturbations.
Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z) - On the Limitations of Stochastic Pre-processing Defenses [42.80542472276451]
Defending against adversarial examples remains an open problem.
A common belief is that randomness at inference increases the cost of finding adversarial inputs.
In this paper, we investigate such pre-processing defenses and demonstrate that they are flawed.
arXiv Detail & Related papers (2022-06-19T21:54:42Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Parallel Rectangle Flip Attack: A Query-based Black-box Attack against
Object Detection [89.08832589750003]
We propose a Parallel Rectangle Flip Attack (PRFA) via random search to avoid sub-optimal detection near the attacked region.
Our method can effectively and efficiently attack various popular object detectors, including anchor-based and anchor-free, and generate transferable adversarial examples.
arXiv Detail & Related papers (2022-01-22T06:00:17Z) - Theoretical Study of Random Noise Defense against Query-Based Black-Box
Attacks [72.8152874114382]
In this work, we study a simple but promising defense technique, dubbed Random Noise Defense (RND) against query-based black-box attacks.
It is lightweight and can be directly combined with any off-the-shelf models and other defense strategies.
In this work, we present solid theoretical analyses to demonstrate that the defense effect of RND against the query-based black-box attack and the corresponding adaptive attack heavily depends on the magnitude ratio.
arXiv Detail & Related papers (2021-04-23T08:39:41Z) - Local Black-box Adversarial Attacks: A Query Efficient Approach [64.98246858117476]
Adrial attacks have threatened the application of deep neural networks in security-sensitive scenarios.
We propose a novel framework to perturb the discriminative areas of clean examples only within limited queries in black-box attacks.
We conduct extensive experiments to show that our framework can significantly improve the query efficiency during black-box perturbing with a high attack success rate.
arXiv Detail & Related papers (2021-01-04T15:32:16Z) - A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses [4.94950858749529]
We propose a game-theoretic framework for studying attacks and defenses which exist in equilibrium.
We show how this equilibrium defense can be approximated given finitely many samples from a data-generating distribution.
arXiv Detail & Related papers (2020-09-14T15:51:15Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z) - Using an ensemble color space model to tackle adversarial examples [22.732023268348787]
We propose a 3 step method for defending such attacks.
First, we denoise the image using statistical methods.
Second, we show that adopting multiple color spaces in the same model can help us to fight these adversarial attacks further.
arXiv Detail & Related papers (2020-03-10T21:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.