Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think
- URL: http://arxiv.org/abs/2404.13320v2
- Date: Thu, 2 May 2024 02:25:39 GMT
- Title: Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think
- Authors: Haotian Xue, Yongxin Chen,
- Abstract summary: Adversarial examples for diffusion models are widely used as solutions for safety concerns.
This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models.
In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs.
- Score: 14.583181596370386
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adversarial patterns that were generated on LDMs to protect the images, which means that most protection methods nowadays, to some extent, cannot protect our images from malicious attacks. We hope that our insights will inspire the community to rethink the adversarial samples for diffusion models as protection methods and move forward to more effective protection. Codes are available in https://github.com/xavihart/PDM-Pure.
Related papers
- Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models [9.905296922309157]
Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them.
Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations.
Our work proposes a novel attacking framework with a feature representation attack loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of protected images.
arXiv Detail & Related papers (2024-08-21T17:56:34Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - Elijah: Eliminating Backdoors Injected in Diffusion Models via
Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs.
We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM.
Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z) - Targeted Attack Improves Protection against Unauthorized Diffusion Customization [3.1678356835951273]
Diffusion models build a new milestone for image generation yet raising public concerns.
They can be fine-tuned on unauthorized images for customization.
Current protection, leveraging untargeted attacks, does not appear to be effective enough.
We propose a simple yet effective improvement for the protection against unauthorized diffusion customization by introducing targeted attacks.
arXiv Detail & Related papers (2023-10-07T05:24:42Z) - Toward effective protection against diffusion based mimicry through
score distillation [15.95715097030366]
Efforts have been made to add perturbations to protect images from diffusion-based mimicry pipelines.
Most of the existing methods are too ineffective and even impractical to be used by individual users.
We present novel findings on attacking latent diffusion models and propose new plug-and-play strategies for more effective protection.
arXiv Detail & Related papers (2023-10-02T18:56:12Z) - DiffProtect: Generate Adversarial Examples with Diffusion Models for
Facial Privacy Protection [64.77548539959501]
DiffProtect produces more natural-looking encrypted images than state-of-the-art methods.
It achieves significantly higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the CelebA-HQ and FFHQ datasets.
arXiv Detail & Related papers (2023-05-23T02:45:49Z) - Diffusion Models for Imperceptible and Transferable Adversarial Attack [23.991194050494396]
We propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models.
Our proposed method, DiffAttack, is the first that introduces diffusion models into the adversarial attack field.
arXiv Detail & Related papers (2023-05-14T16:02:36Z) - TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets [74.12197473591128]
We propose an effective Trojan attack against diffusion models, TrojDiff.
In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution.
We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers.
arXiv Detail & Related papers (2023-03-10T08:01:23Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.