Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability
- URL: http://arxiv.org/abs/2305.16494v3
- Date: Wed, 17 Jan 2024 15:38:27 GMT
- Title: Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability
- Authors: Haotian Xue, Alexandre Araujo, Bin Hu, Yongxin Chen
- Abstract summary: We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples.
Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
- Score: 62.105715985563656
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural networks are known to be susceptible to adversarial samples: small
variations of natural examples crafted to deliberately mislead the models.
While they can be easily generated using gradient-based techniques in digital
and physical scenarios, they often differ greatly from the actual data
distribution of natural images, resulting in a trade-off between strength and
stealthiness. In this paper, we propose a novel framework dubbed
Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic
adversarial samples. By exploiting a gradient guided by a diffusion model,
Diff-PGD ensures that adversarial samples remain close to the original data
distribution while maintaining their effectiveness. Moreover, our framework can
be easily customized for specific tasks such as digital attacks, physical-world
attacks, and style-based attacks. Compared with existing methods for generating
natural-style adversarial samples, our framework enables the separation of
optimizing adversarial loss from other surrogate losses (e.g.,
content/smoothness/style loss), making it more stable and controllable.
Finally, we demonstrate that the samples generated using Diff-PGD have better
transferability and anti-purification power than traditional gradient-based
methods. Code will be released in https://github.com/xavihart/Diff-PGD
Related papers
- Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z) - GE-AdvGAN: Improving the transferability of adversarial samples by
gradient editing-based adversarial generative model [69.71629949747884]
Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data.
In this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples.
arXiv Detail & Related papers (2024-01-11T16:43:16Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions.
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions.
We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z) - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z) - Enhancing Targeted Attack Transferability via Diversified Weight Pruning [0.3222802562733786]
Malicious attackers can generate targeted adversarial examples by imposing human-imperceptible noise on images.
With cross-model transferable adversarial examples, the vulnerability of neural networks remains even if the model information is kept secret from the attacker.
Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples.
arXiv Detail & Related papers (2022-08-18T07:25:48Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - Generating Out of Distribution Adversarial Attack using Latent Space
Poisoning [5.1314136039587925]
We propose a novel mechanism of generating adversarial examples where the actual image is not corrupted.
latent space representation is utilized to tamper with the inherent structure of the image.
As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset.
arXiv Detail & Related papers (2020-12-09T13:05:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.