Related papers: Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

URL: http://arxiv.org/abs/2310.18762v1
Date: Sat, 28 Oct 2023 17:18:38 GMT
Title: Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness
Authors: Boya Zhang, Weijian Luo, Zhihua Zhang
Abstract summary: Defense against adversarial attacks is important for AI safety. Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing. We propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks.
Score: 22.87882885963586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial attacks can mislead neural network classifiers. The defense against adversarial attacks is important for AI safety. Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing. Diffusion models have been shown to be effective for adversarial purification. Despite their success, many aspects of diffusion purification still remain unexplored. In this paper, we investigate and improve upon three limiting designs of diffusion purification: the use of an improved diffusion model, advanced numerical simulation techniques, and optimal control of randomness. Based on our findings, we propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks. Our work presents a systematic exploration of the limits of diffusion purification methods.

Related papers

Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification [75.09791002021947]
Existing purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples. This approach is fundamentally flawed as the uniform operation of the forward process compromises normal pixels while attempting to combat adversarial perturbations. We propose a heterogeneous purification strategy grounded in the interpretability of neural networks. Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise.
arXiv Detail & Related papers (2025-03-03T11:00:25Z)
LoRID: Low-Rank Iterative Diffusion for Adversarial Purification [3.735798190358]
This work presents an information-theoretic examination of diffusion-based purification methods. We introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors. LoRID achieves superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.
arXiv Detail & Related papers (2024-09-12T17:51:25Z)
Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks. We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary. Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z)
Diffusion-based Adversarial Purification for Intrusion Detection [0.6990493129893112]
crafted perturbations mislead ML models, enabling attackers to evade detection or trigger false alerts. adversarial purification has emerged as a compelling solution, particularly with diffusion models showing promising results. This paper demonstrates the effectiveness of diffusion models in purifying adversarial examples in network intrusion detection.
arXiv Detail & Related papers (2024-06-25T14:48:28Z)
Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models. We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack. Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z)
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective [65.10019978876863]
Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks. In this paper, we argue that the inherentity in the DBP process is the primary driver of its robustness.
arXiv Detail & Related papers (2024-04-22T16:10:38Z)
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples. Recent studies show that even advanced attacks cannot break such defenses effectively. We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z)
Robust Evaluation of Diffusion-Based Adversarial Purification [3.634387981995277]
Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time. White-box attacks are often employed to measure the robustness of the purification. We propose a new purification strategy improving robustness compared to the current diffusion-based purification methods.
arXiv Detail & Related papers (2023-03-16T02:47:59Z)
How to Backdoor Diffusion Models? [74.43215520371506]
This paper presents the first study on the robustness of diffusion models against backdoor attacks. We propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. Our results call attention to potential risks and possible misuse of diffusion models.
arXiv Detail & Related papers (2022-12-11T03:44:38Z)
Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. We propose DiffPure that uses diffusion models for adversarial purification. Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.