A Curious Case of Remarkable Resilience to Gradient Attacks via Fully
Convolutional and Differentiable Front End with a Skip Connection
- URL: http://arxiv.org/abs/2402.17018v1
- Date: Mon, 26 Feb 2024 20:55:47 GMT
- Title: A Curious Case of Remarkable Resilience to Gradient Attacks via Fully
Convolutional and Differentiable Front End with a Skip Connection
- Authors: Leonid Boytsov, Ameya Joshi, Filipe Condessa
- Abstract summary: gradient masking phenomenon is not new, but the degree of masking was quite remarkable for fully differentiable models.
Black box attacks can be partially effective against gradient masking, but they are easily defeated by combining models into randomized ensembles.
- Score: 5.030787492485122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tested front-end enhanced neural models where a frozen classifier was
prepended by a differentiable and fully convolutional model with a skip
connection. By training them using a small learning rate for about one epoch,
we obtained models that retained the accuracy of the backbone classifier while
being unusually resistant to gradient attacks including APGD and FAB-T attacks
from the AutoAttack package, which we attributed to gradient masking. The
gradient masking phenomenon is not new, but the degree of masking was quite
remarkable for fully differentiable models that did not have
gradient-shattering components such as JPEG compression or components that are
expected to cause diminishing gradients.
Though black box attacks can be partially effective against gradient masking,
they are easily defeated by combining models into randomized ensembles. We
estimate that such ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10,
CIFAR100, and ImageNet despite having virtually zero accuracy under adaptive
attacks. Adversarial training of the backbone classifier can further increase
resistance of the front-end enhanced model to gradient attacks. On CIFAR10, the
respective randomized ensemble achieved 90.8$\pm 2.5$% (99% CI) accuracy under
AutoAttack while having only 18.2$\pm 3.6$% accuracy under the adaptive attack.
We do not establish SOTA in adversarial robustness. Instead, we make
methodological contributions and further supports the thesis that adaptive
attacks designed with the complete knowledge of model architecture are crucial
in demonstrating model robustness and that even the so-called white-box
gradient attacks can have limited applicability. Although gradient attacks can
be complemented with black-box attack such as the SQUARE attack or the
zero-order PGD, black-box attacks can be weak against randomized ensembles,
e.g., when ensemble models mask gradients.
Related papers
- DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z) - White-box Membership Inference Attacks against Diffusion Models [13.425726946466423]
Diffusion models have begun to overshadow GANs in industrial applications due to their superior image generation performance.
We aim to design membership inference attacks (MIAs) catered to diffusion models.
We first conduct an exhaustive analysis of existing MIAs on diffusion models, taking into account factors such as black-box/white-box models and the selection of attack features.
We found that white-box attacks are highly applicable in real-world scenarios, and the most effective attacks presently are white-box.
arXiv Detail & Related papers (2023-08-11T22:03:36Z) - Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences.
Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z) - Adversarially Robust Classification by Conditional Generative Model
Inversion [4.913248451323163]
We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack.
Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images.
We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks.
arXiv Detail & Related papers (2022-01-12T23:11:16Z) - Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the
Adversarial Transferability [20.255708227671573]
Black-box adversarial attacks can be transferred from one model to another.
In this work, we propose a novel ensemble attack method called the variance reduced ensemble attack.
Empirical results on the standard ImageNet demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly.
arXiv Detail & Related papers (2021-11-21T06:33:27Z) - Meta Gradient Adversarial Attack [64.5070788261061]
This paper proposes a novel architecture called Metaversa Gradient Adrial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method.
Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task.
By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved.
arXiv Detail & Related papers (2021-08-09T17:44:19Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial Robustness by Design through Analog Computing and Synthetic
Gradients [80.60080084042666]
We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor.
In the white-box setting, our defense works by obfuscating the parameters of the random projection.
We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks.
arXiv Detail & Related papers (2021-01-06T16:15:29Z) - Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model.
We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another.
We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z) - Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural
Gradient Descent [92.4348499398224]
Black-box adversarial attack methods have received special attentions owing to their practicality and simplicity.
We propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks.
ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
arXiv Detail & Related papers (2020-02-18T21:48:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.