An Eye for an Eye: Defending against Gradient-based Attacks with
Gradients
- URL: http://arxiv.org/abs/2202.01117v1
- Date: Wed, 2 Feb 2022 16:22:28 GMT
- Title: An Eye for an Eye: Defending against Gradient-based Attacks with
Gradients
- Authors: Hanbin Hong, Yuan Hong, and Yu Kong
- Abstract summary: gradient-based adversarial attacks have demonstrated high success rates.
We show that the gradients can also be exploited as a powerful weapon to defend against adversarial attacks.
By using both gradient maps and adversarial images as inputs, we propose a Two-stream Restoration Network (TRN) to restore the adversarial images.
- Score: 24.845539113785552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have been shown to be vulnerable to adversarial attacks.
In particular, gradient-based attacks have demonstrated high success rates
recently. The gradient measures how each image pixel affects the model output,
which contains critical information for generating malicious perturbations. In
this paper, we show that the gradients can also be exploited as a powerful
weapon to defend against adversarial attacks. By using both gradient maps and
adversarial images as inputs, we propose a Two-stream Restoration Network (TRN)
to restore the adversarial images. To optimally restore the perturbed images
with two streams of inputs, a Gradient Map Estimation Mechanism is proposed to
estimate the gradients of adversarial images, and a Fusion Block is designed in
TRN to explore and fuse the information in two streams. Once trained, our TRN
can defend against a wide range of attack methods without significantly
degrading the performance of benign inputs. Also, our method is generalizable,
scalable, and hard to bypass. Experimental results on CIFAR10, SVHN, and
Fashion MNIST demonstrate that our method outperforms state-of-the-art defense
methods.
Related papers
- Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts [23.279652897139286]
Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns.
We provide counterfactual explanations for face forgery detection from an artifact removal perspective.
Our method achieves over 90% attack success rate and superior attack transferability.
arXiv Detail & Related papers (2024-04-12T09:13:37Z) - Dual Adversarial Resilience for Collaborating Robust Underwater Image
Enhancement and Perception [54.672052775549]
In this work, we introduce a collaborative adversarial resilience network, dubbed CARNet, for underwater image enhancement and subsequent detection tasks.
We propose a synchronized attack training strategy with both visual-driven and perception-driven attacks enabling the network to discern and remove various types of attacks.
Experiments demonstrate that the proposed method outputs visually appealing enhancement images and perform averagely 6.71% higher detection mAP than state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T06:52:05Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Adaptive Perturbation for Adversarial Attack [50.77612889697216]
We propose a new gradient-based attack method for adversarial examples.
We use the exact gradient direction with a scaling factor for generating adversarial perturbations.
Our method exhibits higher transferability and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T07:57:41Z) - Adversarial examples by perturbing high-level features in intermediate
decoder layers [0.0]
Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder.
Our perturbation possesses semantic meaning, such as a longer beak or green tints.
We show that our method modifies key features such as edges and that defence techniques based on adversarial training are vulnerable to our attacks.
arXiv Detail & Related papers (2021-10-14T07:08:15Z) - Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence.
We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z) - Adversarial example generation with AdaBelief Optimizer and Crop
Invariance [8.404340557720436]
Adversarial attacks can be an important method to evaluate and select robust models in safety-critical applications.
We propose AdaBelief Iterative Fast Gradient Method (ABI-FGM) and Crop-Invariant attack Method (CIM) to improve the transferability of adversarial examples.
Our method has higher success rates than state-of-the-art gradient-based attack methods.
arXiv Detail & Related papers (2021-02-07T06:00:36Z) - Boosting Gradient for White-Box Adversarial Attacks [60.422511092730026]
We propose a universal adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient based white-box attack algorithms.
Our approach calculates the gradient of the loss function versus network input, maps the values to scores, and selects a part of them to update the misleading gradients.
arXiv Detail & Related papers (2020-10-21T02:13:26Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z) - Encryption Inspired Adversarial Defense for Visual Classification [17.551718914117917]
We propose a new adversarial defense inspired by image encryption methods.
The proposed method utilizes a block-wise pixel shuffling with a secret key.
It achieves high accuracy (91.55 on clean images and (89.66 on adversarial examples with noise distance of 8/255 on CIFAR-10 dataset)
arXiv Detail & Related papers (2020-05-16T14:18:07Z) - Towards Sharper First-Order Adversary with Quantized Gradients [43.02047596005796]
adversarial training has been the most successful defense against adversarial attacks.
In state-of-the-art first-order attacks, adversarial examples with sign gradients retain the sign information of each gradient component but discard the relative magnitude between components.
Gradient quantization not only preserves the sign information, but also keeps the relative magnitude between components.
arXiv Detail & Related papers (2020-02-01T14:33:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.