Adversarially Robust Classification by Conditional Generative Model
Inversion
- URL: http://arxiv.org/abs/2201.04733v1
- Date: Wed, 12 Jan 2022 23:11:16 GMT
- Title: Adversarially Robust Classification by Conditional Generative Model
Inversion
- Authors: Mitra Alirezaei, Tolga Tasdizen
- Abstract summary: We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack.
Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images.
We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks.
- Score: 4.913248451323163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most adversarial attack defense methods rely on obfuscating gradients. These
methods are successful in defending against gradient-based attacks; however,
they are easily circumvented by attacks which either do not use the gradient or
by attacks which approximate and use the corrected gradient. Defenses that do
not obfuscate gradients such as adversarial training exist, but these
approaches generally make assumptions about the attack such as its magnitude.
We propose a classification model that does not obfuscate gradients and is
robust by construction without assuming prior knowledge about the attack. Our
method casts classification as an optimization problem where we "invert" a
conditional generator trained on unperturbed, natural images to find the class
that generates the closest sample to the query image. We hypothesize that a
potential source of brittleness against adversarial attacks is the
high-to-low-dimensional nature of feed-forward classifiers which allows an
adversary to find small perturbations in the input space that lead to large
changes in the output space. On the other hand, a generative model is typically
a low-to-high-dimensional mapping. While the method is related to Defense-GAN,
the use of a conditional generative model and inversion in our model instead of
the feed-forward classifier is a critical difference. Unlike Defense-GAN, which
was shown to generate obfuscated gradients that are easily circumvented, we
show that our method does not obfuscate gradients. We demonstrate that our
model is extremely robust against black-box attacks and has improved robustness
against white-box attacks compared to naturally trained, feed-forward
classifiers.
Related papers
- Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - Learning to Invert: Simple Adaptive Attacks for Gradient Inversion in
Federated Learning [31.374376311614675]
Gradient inversion attack enables recovery of training samples from model gradients in federated learning.
We show that existing defenses can be broken by a simple adaptive attack.
arXiv Detail & Related papers (2022-10-19T20:41:30Z) - Gradient Obfuscation Checklist Test Gives a False Sense of Security [85.8719866710494]
Main source of robustness of such defenses is often due to the obfuscation of the gradients, offering a false sense of security.
Five characteristics have been identified, which are commonly observed when the improvement in robustness is mainly caused by gradient obfuscation.
It has since become a trend to use these five characteristics as a sufficient test, to determine whether or not gradient obfuscation is the main source of robustness.
arXiv Detail & Related papers (2022-06-03T17:27:10Z) - Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based
Prior [50.393092185611536]
We consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model.
Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries.
We propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging.
arXiv Detail & Related papers (2022-03-13T04:06:27Z) - RamBoAttack: A Robust Query Efficient Deep Neural Network Decision
Exploit [9.93052896330371]
We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients.
The RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class.
arXiv Detail & Related papers (2021-12-10T01:25:24Z) - Improving the Transferability of Adversarial Examples with New Iteration
Framework and Input Dropout [8.24029748310858]
We propose a new gradient iteration framework, which redefines the relationship between the iteration step size, the number of perturbations, and the maximum iterations.
Under this framework, we easily improve the attack success rate of DI-TI-MIM.
In addition, we propose a gradient iterative attack method based on input dropout, which can be well combined with our framework.
arXiv Detail & Related papers (2021-06-03T06:36:38Z) - Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples.
Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z) - Staircase Sign Method for Boosting Adversarial Attacks [123.19227129979943]
Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.
We propose a novel Staircase Sign Method (S$2$M) to alleviate this issue, thus boosting transfer-based attacks.
Our method can be generally integrated into any transfer-based attacks, and the computational overhead is negligible.
arXiv Detail & Related papers (2021-04-20T02:31:55Z) - Gradient-based Adversarial Attacks against Text Transformers [96.73493433809419]
We propose the first general-purpose gradient-based attack against transformer models.
We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.
arXiv Detail & Related papers (2021-04-15T17:43:43Z) - Adversarial example generation with AdaBelief Optimizer and Crop
Invariance [8.404340557720436]
Adversarial attacks can be an important method to evaluate and select robust models in safety-critical applications.
We propose AdaBelief Iterative Fast Gradient Method (ABI-FGM) and Crop-Invariant attack Method (CIM) to improve the transferability of adversarial examples.
Our method has higher success rates than state-of-the-art gradient-based attack methods.
arXiv Detail & Related papers (2021-02-07T06:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.