DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification
- URL: http://arxiv.org/abs/2311.16124v2
- Date: Thu, 4 Jan 2024 03:19:54 GMT
- Title: DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification
- Authors: Mintong Kang, Dawn Song, Bo Li
- Abstract summary: Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
- Score: 63.65630243675792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based purification defenses leverage diffusion models to remove
crafted perturbations of adversarial examples and achieve state-of-the-art
robustness. Recent studies show that even advanced attacks cannot break such
defenses effectively, since the purification process induces an extremely deep
computational graph which poses the potential problem of gradient obfuscation,
high memory cost, and unbounded randomness. In this paper, we propose a unified
framework DiffAttack to perform effective and efficient attacks against
diffusion-based purification defenses, including both DDPM and score-based
approaches. In particular, we propose a deviated-reconstruction loss at
intermediate diffusion steps to induce inaccurate density gradient estimation
to tackle the problem of vanishing/exploding gradients. We also provide a
segment-wise forwarding-backwarding algorithm, which leads to memory-efficient
gradient backpropagation. We validate the attack effectiveness of DiffAttack
compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that
DiffAttack decreases the robust accuracy of models compared with SOTA attacks
by over 20% on CIFAR-10 under $\ell_\infty$ attack $(\epsilon=8/255)$, and over
10% on ImageNet under $\ell_\infty$ attack $(\epsilon=4/255)$. We conduct a
series of ablations studies, and we find 1) DiffAttack with the
deviated-reconstruction loss added over uniformly sampled time steps is more
effective than that added over only initial/final steps, and 2) diffusion-based
purification with a moderate diffusion length is more robust under DiffAttack.
Related papers
- Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification [20.15955997832192]
Diffusion-based purification (DBP) is a defense against adversarial examples (AEs)
We revisit this claim, focusing on gradient-based strategies that back-propagate the loss gradients through the defense.
We show that such an optimization method invalidates DBP's core foundations and restricts the purified outputs to a distribution over malicious samples instead.
arXiv Detail & Related papers (2024-11-25T17:30:32Z) - Struggle with Adversarial Defense? Try Diffusion [8.274506117450628]
Adrial attacks induce misclassification by introducing subtle perturbations.
diffusion-based adversarial training often encounters convergence challenges and high computational expenses.
We propose the Truth Maximization Diffusion (TMDC) to overcome these issues.
arXiv Detail & Related papers (2024-04-12T06:52:40Z) - Enhancing Adversarial Robustness via Score-Based Optimization [22.87882885963586]
Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations.
We introduce a novel adversarial defense scheme named ScoreOpt, which optimize adversarial samples at test-time.
Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both performance and robustness speed.
arXiv Detail & Related papers (2023-07-10T03:59:42Z) - Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples.
Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z) - Robust Classification via a Single Diffusion Model [37.46217654590878]
Robust Diffusion (RDC) is a generative classifier constructed from a pre-trained diffusion model to be adversarially robust.
RDC achieves $75.67%$ robust accuracy against various $ell_infty$ norm-bounded adaptive attacks with $epsilon_infty/255$ on CIFAR-10.
arXiv Detail & Related papers (2023-05-24T15:25:19Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - Towards Adversarial Patch Analysis and Certified Defense against Crowd
Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks.
We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z) - Dynamically Sampled Nonlocal Gradients for Stronger Adversarial Attacks [3.055601224691843]
The vulnerability of deep neural networks to small and even imperceptible perturbations has become a central topic in deep learning research.
We propose Dynamically Dynamically Nonlocal Gradient Descent (DSNGD) as a vulnerability defense mechanism.
We show that DSNGD-based attacks are average 35% faster while achieving 0.9% to 27.1% higher success rates compared to their gradient descent-based counterparts.
arXiv Detail & Related papers (2020-11-05T08:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.