Instant Adversarial Purification with Adversarial Consistency Distillation
- URL: http://arxiv.org/abs/2408.17064v2
- Date: Mon, 2 Sep 2024 06:25:09 GMT
- Title: Instant Adversarial Purification with Adversarial Consistency Distillation
- Authors: Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Chun Pong Lau,
- Abstract summary: We propose One Step Control Purification (OSCP), a diffusion-based purification model that can purify the adversarial image in one Neural Evaluation (NFE) in diffusion models.
We achieve defense success rate of 74.19% on ImageNet, only requiring 0.1s for each purification.
- Score: 1.224954637705144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks, despite their remarkable performance in widespread applications, including image classification, are also known to be vulnerable to subtle adversarial noise. Although some diffusion-based purification methods have been proposed, for example, DiffPure, those methods are time-consuming. In this paper, we propose One Step Control Purification (OSCP), a diffusion-based purification model that can purify the adversarial image in one Neural Function Evaluation (NFE) in diffusion models. We use Latent Consistency Model (LCM) and ControlNet for our one-step purification. OSCP is computationally friendly and time efficient compared to other diffusion-based purification methods; we achieve defense success rate of 74.19\% on ImageNet, only requiring 0.1s for each purification. Moreover, there is a fundamental incongruence between consistency distillation and adversarial perturbation. To address this ontological dissonance, we propose Gaussian Adversarial Noise Distillation (GAND), a novel consistency distillation framework that facilitates a more nuanced reconciliation of the latent space dynamics, effectively bridging the natural and adversarial manifolds. Our experiments show that the GAND does not need a Full Fine Tune (FFT); PEFT, e.g., LoRA is sufficient.
Related papers
- LoRID: Low-Rank Iterative Diffusion for Adversarial Purification [3.735798190358]
This work presents an information-theoretic examination of diffusion-based purification methods.
We introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors.
LoRID achieves superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.
arXiv Detail & Related papers (2024-09-12T17:51:25Z) - Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks.
We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary.
Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z) - Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness [28.09748997491938]
We introduce Consistency Purification, an efficiency-effectiveness superior purifier compared to the previous work.
The consistency model is a one-step generative model distilled from PF-ODE, thus can generate on-manifold purified images with a single network evaluation.
Our comprehensive experiments demonstrate that our Consistency Purification framework achieves state-of-the-art certified robustness and efficiency compared to baseline methods.
arXiv Detail & Related papers (2024-06-30T08:34:35Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean
Diffusion Model [8.695439655048634]
Diffusion-based adversarial purification focuses on using the diffusion model to generate a clean image against adversarial attacks.
We propose MimicDiffusion, a new diffusion-based adversarial purification technique, that directly approximates the generative process of the diffusion model with the clean image as input.
Experiments on three image datasets demonstrate that MimicDiffusion significantly performs better than the state-of-the-art baselines.
arXiv Detail & Related papers (2023-12-08T02:32:47Z) - Purify++: Improving Diffusion-Purification with Advanced Diffusion
Models and Control of Randomness [22.87882885963586]
Defense against adversarial attacks is important for AI safety.
Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing.
We propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks.
arXiv Detail & Related papers (2023-10-28T17:18:38Z) - Noise-Free Score Distillation [78.79226724549456]
Noise-Free Score Distillation (NFSD) process requires minimal modifications to the original SDS framework.
We achieve more effective distillation of pre-trained text-to-image diffusion models while using a nominal CFG scale.
arXiv Detail & Related papers (2023-10-26T17:12:26Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Adversarial purification with Score-based generative models [56.88185136509654]
We propose a novel adversarial purification method based on an EBM trained with Denoising Score-Matching (DSM)
We introduce a simple yet effective randomized purification scheme that injects random noises into images before purification.
We show that our purification method is robust against various attacks and demonstrate its state-of-the-art performances.
arXiv Detail & Related papers (2021-06-11T04:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.