DiffBreak: Breaking Diffusion-Based Purification with Adaptive Attacks
- URL: http://arxiv.org/abs/2411.16598v2
- Date: Tue, 04 Feb 2025 20:04:20 GMT
- Title: DiffBreak: Breaking Diffusion-Based Purification with Adaptive Attacks
- Authors: Andre Kassis, Urs Hengartner, Yaoliang Yu,
- Abstract summary: Diffusion-based purification (DBP) has emerged as a cornerstone defense against adversarial examples (AEs)
We prove that adaptive gradient-based attacks nullify this foundational claim.
We propose a novel adaptation of a recent optimization method against deepfake watermarking.
- Score: 20.15955997832192
- License:
- Abstract: Diffusion-based purification (DBP) has emerged as a cornerstone defense against adversarial examples (AEs), widely regarded as robust due to its use of diffusion models (DMs) that project AEs onto the natural data distribution. However, contrary to prior assumptions, we theoretically prove that adaptive gradient-based attacks nullify this foundational claim, effectively targeting the DM rather than the classifier and causing purified outputs to align with adversarial distributions. This surprising discovery prompts a reassessment of DBP's robustness, revealing it stems from critical flaws in backpropagation techniques used so far for attacking DBP. To address these gaps, we introduce DiffBreak, a novel and reliable gradient library for DBP, which exposes how adaptive attacks drastically degrade its robustness. In stricter majority-vote settings, where classifier decisions aggregate predictions over multiple purified inputs, DBP retains partial robustness to traditional norm-bounded AEs due to its stochasticity disrupting adversarial alignment. However, we propose a novel adaptation of a recent optimization method against deepfake watermarking, crafting systemic adversarial perturbations that defeat DBP even under these conditions, ultimately challenging its viability as a defense without improvements.
Related papers
- Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds [48.37843602248313]
Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions.
We propose Consistency Model-based Adversarial Purification (CMAP), which optimize vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data.
CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.
arXiv Detail & Related papers (2024-12-11T14:14:02Z) - ADBM: Adversarial diffusion bridge model for reliable adversarial purification [21.2538921336578]
Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples.
We find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification to be suboptimal.
We propose a novel Adrialversa Diffusion Bridge Model, termed ADBM, which constructs a reverse bridge from diffused adversarial data back to its original clean examples.
arXiv Detail & Related papers (2024-08-01T06:26:05Z) - Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective [65.10019978876863]
Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks.
In this paper, we argue that the inherentity in the DBP process is the primary driver of its robustness.
arXiv Detail & Related papers (2024-04-22T16:10:38Z) - Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions.
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions.
We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z) - LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via
Latent Ensemble Attack [11.764601181046496]
Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society.
To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs.
We propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process.
arXiv Detail & Related papers (2023-07-04T07:00:37Z) - Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples.
Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z) - Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive
Diffusion [70.60038549155485]
Deep 3D point cloud models are sensitive to adversarial attacks, which poses threats to safety-critical applications such as autonomous driving.
This paper introduces a novel distortion-aware defense framework that can rebuild the pristine data distribution with a tailored intensity estimator and a diffusion model.
arXiv Detail & Related papers (2022-11-29T14:32:43Z) - Improved Certified Defenses against Data Poisoning with (Deterministic)
Finite Aggregation [122.83280749890078]
We propose an improved certified defense against general poisoning attacks, namely Finite Aggregation.
In contrast to DPA, which directly splits the training set into disjoint subsets, our method first splits the training set into smaller disjoint subsets.
We offer an alternative view of our method, bridging the designs of deterministic and aggregation-based certified defenses.
arXiv Detail & Related papers (2022-02-05T20:08:58Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.