Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training
- URL: http://arxiv.org/abs/2404.14309v1
- Date: Mon, 22 Apr 2024 16:10:38 GMT
- Title: Towards Better Adversarial Purification via Adversarial Denoising Diffusion Training
- Authors: Yiming Liu, Kezhao Liu, Yao Xiao, Ziyi Dong, Xiaogang Xu, Pengxu Wei, Liang Lin,
- Abstract summary: diffusion-based purification (DBP) has emerged as a promising approach for defending against adversarial attacks.
Previous studies have used questionable methods to evaluate the robustness of DBP models.
We propose Adversarial Denoising Diffusion Training (ADDT) to improve robustness of DBP models.
- Score: 65.10019978876863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, diffusion-based purification (DBP) has emerged as a promising approach for defending against adversarial attacks. However, previous studies have used questionable methods to evaluate the robustness of DBP models, their explanations of DBP robustness also lack experimental support. We re-examine DBP robustness using precise gradient, and discuss the impact of stochasticity on DBP robustness. To better explain DBP robustness, we assess DBP robustness under a novel attack setting, Deterministic White-box, and pinpoint stochasticity as the main factor in DBP robustness. Our results suggest that DBP models rely on stochasticity to evade the most effective attack direction, rather than directly countering adversarial perturbations. To improve the robustness of DBP models, we propose Adversarial Denoising Diffusion Training (ADDT). This technique uses Classifier-Guided Perturbation Optimization (CGPO) to generate adversarial perturbation through guidance from a pre-trained classifier, and uses Rank-Based Gaussian Mapping (RBGM) to convert adversarial pertubation into a normal Gaussian distribution. Empirical results show that ADDT improves the robustness of DBP models. Further experiments confirm that ADDT equips DBP models with the ability to directly counter adversarial perturbations.
Related papers
- DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification [0.0]
Diffusion Bridge Distillation for Purification (DBLP) is a novel and efficient diffusion-based framework for adversarial purification.<n>DBLP achieves robust accuracy, superior image quality, and around 0.2s inference time, marking a significant step toward real-time adversarial purification.
arXiv Detail & Related papers (2025-08-01T11:47:36Z) - Navigating Sparse Molecular Data with Stein Diffusion Guidance [48.21071466968102]
optimal control (SOC) has emerged as a principled framework for fine-tuning diffusion models.<n>A class of training-free approaches has been developed that guides diffusion models using off-the-shelf classifiers on predicted clean samples.<n>We propose a novel training-free guidance framework based on a surrogate optimal control objective.
arXiv Detail & Related papers (2025-07-07T21:14:27Z) - How Do Diffusion Models Improve Adversarial Robustness? [3.729242965449096]
We investigate how and how well diffusion models improve adversarial robustness.<n>We find that the purified images are heavily influenced by the internal randomness of diffusion models.<n>Our findings provide novel insights into the mechanisms underlying diffusion-based purification.
arXiv Detail & Related papers (2025-05-28T20:19:21Z) - Towards more transferable adversarial attack in black-box manner [1.1417805445492082]
Black-box attacks based on transferability have received significant attention due to their practical applicability in real-world scenarios.<n>Recent state-of-the-art approach DiffPGD has demonstrated enhanced transferability by employing diffusion-based adversarial purification models for adaptive attacks.<n>We propose a novel loss function coupled with a unique surrogate model to validate our hypothesis.
arXiv Detail & Related papers (2025-05-23T16:49:20Z) - A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation [55.53426007439564]
Estimating individualized treatment effects from observational data is a central challenge in causal inference.<n>In inverse probability weighting (IPW) is a well-established solution to this problem, but its integration into modern deep learning frameworks remains limited.<n>We propose Importance-Weighted Diffusion Distillation (IWDD), a novel generative framework that combines the pretraining of diffusion models with importance-weighted score distillation.
arXiv Detail & Related papers (2025-05-16T17:00:52Z) - Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification [20.15955997832192]
Diffusion-based purification (DBP) is a defense against adversarial examples (AEs)
We revisit this claim, focusing on gradient-based strategies that back-propagate the loss gradients through the defense.
We show that such an optimization method invalidates DBP's core foundations and restricts the purified outputs to a distribution over malicious samples instead.
arXiv Detail & Related papers (2024-11-25T17:30:32Z) - Instant Adversarial Purification with Adversarial Consistency Distillation [1.3165428727965363]
One Step Control Purification (OSCP) is a novel defense framework that achieves robust adversarial purification in a single Neural Function Evaluation.
Our experimental results on ImageNet showcase OSCP's superior performance, achieving a 74.19% defense success rate with merely 0.1s per purification.
arXiv Detail & Related papers (2024-08-30T07:49:35Z) - Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information [75.36597470578724]
Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks.
We propose gUided Purification (COUP) algorithm, which purifies while keeping away from the classifier decision boundary.
Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.
arXiv Detail & Related papers (2024-08-12T02:48:00Z) - ADBM: Adversarial diffusion bridge model for reliable adversarial purification [21.2538921336578]
Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples.
We find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification to be suboptimal.
We propose a novel Adrialversa Diffusion Bridge Model, termed ADBM, which constructs a reverse bridge from diffused adversarial data back to its original clean examples.
arXiv Detail & Related papers (2024-08-01T06:26:05Z) - Diffusion-based Adversarial Purification for Intrusion Detection [0.6990493129893112]
crafted perturbations mislead ML models, enabling attackers to evade detection or trigger false alerts.
adversarial purification has emerged as a compelling solution, particularly with diffusion models showing promising results.
This paper demonstrates the effectiveness of diffusion models in purifying adversarial examples in network intrusion detection.
arXiv Detail & Related papers (2024-06-25T14:48:28Z) - Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions.
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions.
We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z) - Enhancing Adversarial Robustness via Score-Based Optimization [22.87882885963586]
Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations.
We introduce a novel adversarial defense scheme named ScoreOpt, which optimize adversarial samples at test-time.
Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both performance and robustness speed.
arXiv Detail & Related papers (2023-07-10T03:59:42Z) - Reconstructing Graph Diffusion History from a Single Snapshot [87.20550495678907]
We propose a novel barycenter formulation for reconstructing Diffusion history from A single SnapsHot (DASH)
We prove that estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation.
We also develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO)
arXiv Detail & Related papers (2023-06-01T09:39:32Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Balancing detectability and performance of attacks on the control
channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs)
This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.