Related papers: A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection

A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection

URL: http://arxiv.org/abs/2402.17018v1
Date: Mon, 26 Feb 2024 20:55:47 GMT
Title: A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection
Authors: Leonid Boytsov, Ameya Joshi, Filipe Condessa
Abstract summary: gradient masking phenomenon is not new, but the degree of masking was quite remarkable for fully differentiable models. Black box attacks can be partially effective against gradient masking, but they are easily defeated by combining models into randomized ensembles.
Score: 5.030787492485122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We tested front-end enhanced neural models where a frozen classifier was prepended by a differentiable and fully convolutional model with a skip connection. By training them using a small learning rate for about one epoch, we obtained models that retained the accuracy of the backbone classifier while being unusually resistant to gradient attacks including APGD and FAB-T attacks from the AutoAttack package, which we attributed to gradient masking. The gradient masking phenomenon is not new, but the degree of masking was quite remarkable for fully differentiable models that did not have gradient-shattering components such as JPEG compression or components that are expected to cause diminishing gradients. Though black box attacks can be partially effective against gradient masking, they are easily defeated by combining models into randomized ensembles. We estimate that such ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10, CIFAR100, and ImageNet despite having virtually zero accuracy under adaptive attacks. Adversarial training of the backbone classifier can further increase resistance of the front-end enhanced model to gradient attacks. On CIFAR10, the respective randomized ensemble achieved 90.8$\pm 2.5$% (99% CI) accuracy under AutoAttack while having only 18.2$\pm 3.6$% accuracy under the adaptive attack. We do not establish SOTA in adversarial robustness. Instead, we make methodological contributions and further supports the thesis that adaptive attacks designed with the complete knowledge of model architecture are crucial in demonstrating model robustness and that even the so-called white-box gradient attacks can have limited applicability. Although gradient attacks can be complemented with black-box attack such as the SQUARE attack or the zero-order PGD, black-box attacks can be weak against randomized ensembles, e.g., when ensemble models mask gradients.

Related papers

LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks [13.154512864498912]
Adversarial Training (AT) and Adversarial Distillation (AD) include adversarial examples during the training phase. This paper considers an even more realistic evaluation scenario: gray-box attacks, which assume that the attacker knows the architecture and the dataset used to train the target network, but cannot access its gradients. We provide empirical evidence that models are vulnerable to gray-box attacks and propose LISArD, a defense mechanism that does not increase computational and temporal costs but provides robustness against gray-box and white-box attacks without including AT.
arXiv Detail & Related papers (2025-02-27T22:02:06Z)
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples. Recent studies show that even advanced attacks cannot break such defenses effectively. We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z)
White-box Membership Inference Attacks against Diffusion Models [13.425726946466423]
Diffusion models have begun to overshadow GANs in industrial applications due to their superior image generation performance. We aim to design membership inference attacks (MIAs) catered to diffusion models. We first conduct an exhaustive analysis of existing MIAs on diffusion models, taking into account factors such as black-box/white-box models and the selection of attack features. We found that white-box attacks are highly applicable in real-world scenarios, and the most effective attacks presently are white-box.
arXiv Detail & Related papers (2023-08-11T22:03:36Z)
Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences. Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z)
Adversarially Robust Classification by Conditional Generative Model Inversion [4.913248451323163]
We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks.
arXiv Detail & Related papers (2022-01-12T23:11:16Z)
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability [20.255708227671573]
Black-box adversarial attacks can be transferred from one model to another. In this work, we propose a novel ensemble attack method called the variance reduced ensemble attack. Empirical results on the standard ImageNet demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly.
arXiv Detail & Related papers (2021-11-21T06:33:27Z)
Meta Gradient Adversarial Attack [64.5070788261061]
This paper proposes a novel architecture called Metaversa Gradient Adrial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method. Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task. By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved.
arXiv Detail & Related papers (2021-08-09T17:44:19Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Adversarial Robustness by Design through Analog Computing and Synthetic Gradients [80.60080084042666]
We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor. In the white-box setting, our defense works by obfuscating the parameters of the random projection. We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks.
arXiv Detail & Related papers (2021-01-06T16:15:29Z)
Orthogonal Deep Models As Defense Against Black-Box Attacks [71.23669614195195]
We study the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. We introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another. We verify the effectiveness of our technique on a variety of large-scale models.
arXiv Detail & Related papers (2020-06-26T08:29:05Z)
Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent [92.4348499398224]
Black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. We propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks. ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.
arXiv Detail & Related papers (2020-02-18T21:48:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.