Purify++: Improving Diffusion-Purification with Advanced Diffusion
Models and Control of Randomness
- URL: http://arxiv.org/abs/2310.18762v1
- Date: Sat, 28 Oct 2023 17:18:38 GMT
- Title: Purify++: Improving Diffusion-Purification with Advanced Diffusion
Models and Control of Randomness
- Authors: Boya Zhang, Weijian Luo, Zhihua Zhang
- Abstract summary: Defense against adversarial attacks is important for AI safety.
Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing.
We propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks.
- Score: 22.87882885963586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial attacks can mislead neural network classifiers. The defense
against adversarial attacks is important for AI safety. Adversarial
purification is a family of approaches that defend adversarial attacks with
suitable pre-processing. Diffusion models have been shown to be effective for
adversarial purification. Despite their success, many aspects of diffusion
purification still remain unexplored. In this paper, we investigate and improve
upon three limiting designs of diffusion purification: the use of an improved
diffusion model, advanced numerical simulation techniques, and optimal control
of randomness. Based on our findings, we propose Purify++, a new diffusion
purification algorithm that is now the state-of-the-art purification method
against several adversarial attacks. Our work presents a systematic exploration
of the limits of diffusion purification methods.
Related papers
- Diffusion-based Adversarial Purification for Intrusion Detection [0.6990493129893112]
crafted perturbations mislead ML models, enabling attackers to evade detection or trigger false alerts.
adversarial purification has emerged as a compelling solution, particularly with diffusion models showing promising results.
This paper demonstrates the effectiveness of diffusion models in purifying adversarial examples in network intrusion detection.
arXiv Detail & Related papers (2024-06-25T14:48:28Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z) - SafeDiffuser: Safe Planning with Diffusion Probabilistic Models [97.80042457099718]
Diffusion model-based approaches have shown promise in data-driven planning, but there are no safety guarantees.
We propose a new method, called SafeDiffuser, to ensure diffusion probabilistic models satisfy specifications.
We test our method on a series of safe planning tasks, including maze path generation, legged robot locomotion, and 3D space manipulation.
arXiv Detail & Related papers (2023-05-31T19:38:12Z) - Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous
Dimensions in Pre-trained Language Models Caused by Backdoor or Bias [64.81358555107788]
Pre-trained Language Models (PLMs) may be poisonous with backdoors or bias injected by the suspicious attacker during the fine-tuning process.
We propose the Fine-purifying approach, which utilizes the diffusion theory to study the dynamic process of fine-tuning for finding potentially poisonous dimensions.
To the best of our knowledge, we are the first to study the dynamics guided by the diffusion theory for safety or defense purposes.
arXiv Detail & Related papers (2023-05-08T08:40:30Z) - Robust Evaluation of Diffusion-Based Adversarial Purification [3.634387981995277]
Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time.
White-box attacks are often employed to measure the robustness of the purification.
We propose a new purification strategy improving robustness compared to the current diffusion-based purification methods.
arXiv Detail & Related papers (2023-03-16T02:47:59Z) - How to Backdoor Diffusion Models? [74.43215520371506]
This paper presents the first study on the robustness of diffusion models against backdoor attacks.
We propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation.
Our results call attention to potential risks and possible misuse of diffusion models.
arXiv Detail & Related papers (2022-12-11T03:44:38Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.