Related papers: Diffusion Models for Adversarial Purification

Diffusion Models for Adversarial Purification

URL: http://arxiv.org/abs/2205.07460v1
Date: Mon, 16 May 2022 06:03:00 GMT
Title: Diffusion Models for Adversarial Purification
Authors: Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar
Abstract summary: Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. We propose DiffPure that uses diffusion models for adversarial purification. Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
Score: 69.1882221038846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin. Project page: https://diffpure.github.io.

Related papers

Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification [75.09791002021947]
Existing purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples. This approach is fundamentally flawed as the uniform operation of the forward process compromises normal pixels while attempting to combat adversarial perturbations. We propose a heterogeneous purification strategy grounded in the interpretability of neural networks. Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise.
arXiv Detail & Related papers (2025-03-03T11:00:25Z)
Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks. We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary. Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z)
ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks. We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z)
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z)
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space. Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings. The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z)
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model [8.695439655048634]
Diffusion-based adversarial purification focuses on using the diffusion model to generate a clean image against adversarial attacks. We propose MimicDiffusion, a new diffusion-based adversarial purification technique, that directly approximates the generative process of the diffusion model with the clean image as input. Experiments on three image datasets demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines.
arXiv Detail & Related papers (2023-12-08T02:32:47Z)
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models. We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z)
Threat Model-Agnostic Adversarial Defense using Diffusion Models [14.603209216642034]
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
arXiv Detail & Related papers (2022-07-17T06:50:48Z)
Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models. Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.