MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean
Diffusion Model
- URL: http://arxiv.org/abs/2312.04802v1
- Date: Fri, 8 Dec 2023 02:32:47 GMT
- Title: MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean
Diffusion Model
- Authors: Kaiyu Song, Hanjiang Lai
- Abstract summary: Diffusion-based adversarial purification focuses on using the diffusion model to generate a clean image against adversarial attacks.
We propose MimicDiffusion, a new diffusion-based adversarial purification technique, that directly approximates the generative process of the diffusion model with the clean image as input.
Experiments on three image datasets demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines.
- Score: 8.695439655048634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are vulnerable to adversarial perturbation, where
an imperceptible perturbation is added to the image that can fool the DNNs.
Diffusion-based adversarial purification focuses on using the diffusion model
to generate a clean image against such adversarial attacks. Unfortunately, the
generative process of the diffusion model is also inevitably affected by
adversarial perturbation since the diffusion model is also a deep network where
its input has adversarial perturbation. In this work, we propose
MimicDiffusion, a new diffusion-based adversarial purification technique, that
directly approximates the generative process of the diffusion model with the
clean image as input. Concretely, we analyze the differences between the guided
terms using the clean image and the adversarial sample. After that, we first
implement MimicDiffusion based on Manhattan distance. Then, we propose two
guidance to purify the adversarial perturbation and approximate the clean
diffusion model. Extensive experiments on three image datasets including
CIFAR-10, CIFAR-100, and ImageNet with three classifier backbones including
WideResNet-70-16, WideResNet-28-10, and ResNet50 demonstrate that
MimicDiffusion significantly performs better than the state-of-the-art
baselines. On CIFAR-10, CIFAR-100, and ImageNet, it achieves 92.67\%, 61.35\%,
and 61.53\% average robust accuracy, which are 18.49\%, 13.23\%, and 17.64\%
higher, respectively. The code is available in the supplementary material.
Related papers
- Instant Adversarial Purification with Adversarial Consistency Distillation [1.224954637705144]
We propose One Step Control Purification (OSCP), a diffusion-based purification model that can purify the adversarial image in one Neural Evaluation (NFE) in diffusion models.
We achieve defense success rate of 74.19% on ImageNet, only requiring 0.1s for each purification.
arXiv Detail & Related papers (2024-08-30T07:49:35Z) - Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models.
Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion.
Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - DIRE for Diffusion-Generated Image Detection [128.95822613047298]
We propose a novel representation called DIffusion Reconstruction Error (DIRE)
DIRE measures the error between an input image and its reconstruction counterpart by a pre-trained diffusion model.
It provides a hint that DIRE can serve as a bridge to distinguish generated and real images.
arXiv Detail & Related papers (2023-03-16T13:15:03Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Adaptive Clustering of Robust Semantic Representations for Adversarial
Image Purification [0.9203366434753543]
We propose a robust defense against adversarial attacks, which is model agnostic and generalizable to unseen adversaries.
In this paper, we extract the latent representations for each class and adaptively cluster the latent representations that share a semantic similarity.
We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution.
arXiv Detail & Related papers (2021-04-05T21:07:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.