Diffusion Models for Adversarial Purification
- URL: http://arxiv.org/abs/2205.07460v1
- Date: Mon, 16 May 2022 06:03:00 GMT
- Title: Diffusion Models for Adversarial Purification
- Authors: Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima
Anandkumar
- Abstract summary: Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
- Score: 69.1882221038846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial purification refers to a class of defense methods that remove
adversarial perturbations using a generative model. These methods do not make
assumptions on the form of attack and the classification model, and thus can
defend pre-existing classifiers against unseen threats. However, their
performance currently falls behind adversarial training methods. In this work,
we propose DiffPure that uses diffusion models for adversarial purification:
Given an adversarial example, we first diffuse it with a small amount of noise
following a forward diffusion process, and then recover the clean image through
a reverse generative process. To evaluate our method against strong adaptive
attacks in an efficient and scalable way, we propose to use the adjoint method
to compute full gradients of the reverse generative process. Extensive
experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ
with three classifier architectures including ResNet, WideResNet and ViT
demonstrate that our method achieves the state-of-the-art results,
outperforming current adversarial training and adversarial purification
methods, often by a large margin. Project page: https://diffpure.github.io.
Related papers
- Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks.
We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary.
Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z) - ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks.
We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z) - Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean
Diffusion Model [8.695439655048634]
Diffusion-based adversarial purification focuses on using the diffusion model to generate a clean image against adversarial attacks.
We propose MimicDiffusion, a new diffusion-based adversarial purification technique, that directly approximates the generative process of the diffusion model with the clean image as input.
Experiments on three image datasets demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines.
arXiv Detail & Related papers (2023-12-08T02:32:47Z) - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z) - Threat Model-Agnostic Adversarial Defense using Diffusion Models [14.603209216642034]
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
arXiv Detail & Related papers (2022-07-17T06:50:48Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.