Related papers: Attacking Adversarial Defences by Smoothing the Loss Landscape

Attacking Adversarial Defences by Smoothing the Loss Landscape

URL: http://arxiv.org/abs/2208.00862v1
Date: Mon, 1 Aug 2022 13:45:47 GMT
Title: Attacking Adversarial Defences by Smoothing the Loss Landscape
Authors: Panagiotis Eustratiadis, Henry Gouk, Da Li and Timothy Hospedales
Abstract summary: A common, but not universal, way to achieve this effect is via the use of neural networks. We show that this is a form of gradient obfuscation, and propose a general extension to gradient-based adversaries. We demonstrate the efficacy of our loss-smoothing method against both and non-stochastic adversarial defences.
Score: 15.11530043291188
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates a family of methods for defending against adversarial attacks that owe part of their success to creating a noisy, discontinuous, or otherwise rugged loss landscape that adversaries find difficult to navigate. A common, but not universal, way to achieve this effect is via the use of stochastic neural networks. We show that this is a form of gradient obfuscation, and propose a general extension to gradient-based adversaries based on the Weierstrass transform, which smooths the surface of the loss function and provides more reliable gradient estimates. We further show that the same principle can strengthen gradient-free adversaries. We demonstrate the efficacy of our loss-smoothing method against both stochastic and non-stochastic adversarial defences that exhibit robustness due to this type of obfuscation. Furthermore, we provide analysis of how it interacts with Expectation over Transformation; a popular gradient-sampling method currently used to attack stochastic defences.

Related papers

Distributional Adversarial Loss [15.258476329309044]
A major challenge in defending against adversarial attacks is the enormous space of possible attacks that even a simple adversary might perform. These include randomized smoothing methods that add noise to the input to take away some of the adversary's impact. Another approach is input discretization which limits the adversary's possible number of actions.
arXiv Detail & Related papers (2024-06-05T17:03:47Z)
Mitigating Feature Gap for Adversarial Robustness by Feature Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner. We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap. Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z)
IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks [16.577595936609665]
We introduce a novel approach to counter adversarial attacks, namely, image resampling. Image resampling transforms a discrete image into a new one, simulating the process of scene recapturing or rerendering as specified by a geometrical transformation. We show that our method significantly enhances the adversarial robustness of diverse deep models against various attacks while maintaining high accuracy on clean images.
arXiv Detail & Related papers (2023-10-18T11:19:32Z)
Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks [18.05924632169541]
We propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM) Specifically, we use data rescaling to substitute the sign function without extra computational cost. Our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.
arXiv Detail & Related papers (2023-07-06T07:52:42Z)
AdvART: Adversarial Art for Camouflaged Object Detection Attacks [7.7889972735711925]
We propose a novel approach to generate naturalistic and inconspicuous adversarial patches. Our technique is based on directly manipulating the pixel values in the patch, which gives higher flexibility and larger space. Our attack achieves superior success rate of up to 91.19% and 72%, respectively, in the digital world and when deployed in smart cameras at the edge.
arXiv Detail & Related papers (2023-03-03T06:28:05Z)
Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA) Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z)
Adversarially Robust Classification by Conditional Generative Model Inversion [4.913248451323163]
We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks.
arXiv Detail & Related papers (2022-01-12T23:11:16Z)
Adaptive Perturbation for Adversarial Attack [50.77612889697216]
We propose a new gradient-based attack method for adversarial examples. We use the exact gradient direction with a scaling factor for generating adversarial perturbations. Our method exhibits higher transferability and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T07:57:41Z)
Staircase Sign Method for Boosting Adversarial Attacks [123.19227129979943]
Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot. We propose a novel Staircase Sign Method (S$2$M) to alleviate this issue, thus boosting transfer-based attacks. Our method can be generally integrated into any transfer-based attacks, and the computational overhead is negligible.
arXiv Detail & Related papers (2021-04-20T02:31:55Z)
Gradient-based Adversarial Attacks against Text Transformers [96.73493433809419]
We propose the first general-purpose gradient-based attack against transformer models. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.
arXiv Detail & Related papers (2021-04-15T17:43:43Z)
Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence. We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z)
Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks. We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames. The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.