Related papers: Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

URL: http://arxiv.org/abs/2305.16494v3
Date: Wed, 17 Jan 2024 15:38:27 GMT
Title: Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability
Authors: Haotian Xue, Alexandre Araujo, Bin Hu, Yongxin Chen
Abstract summary: We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
Score: 62.105715985563656
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD

Related papers

Deep Leakage with Generative Flow Matching Denoiser [54.05993847488204]
We introduce a new deep leakage (DL) attack that integrates a generative Flow Matching (FM) prior into the reconstruction process.<n>Our approach consistently outperforms state-of-the-art attacks across pixel-level, perceptual, and feature-based similarity metrics.
arXiv Detail & Related papers (2026-01-21T14:51:01Z)
ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models [7.250878248686215]
In this paper, we introduce a novel approach for generating adversarial examples based on diffusion models, named ScoreAdv.<n>Our method is capable of generating an unlimited number of natural adversarial examples and can attack not only classification models but also retrieval models.<n>Our results demonstrate that ScoreAdv achieves state-of-the-art attack success rates and image quality.
arXiv Detail & Related papers (2025-07-08T15:17:24Z)
TRAIL: Transferable Robust Adversarial Images via Latent diffusion [35.54430200195499]
Adversarial attacks present severe security risks to deep learning systems.<n> transferability across models remains limited due to distribution mismatches between generated adversarial features and real-world data.<n>We propose Transferable Robust Adrial Images via Latent Diffusion (TRAIL), a test-time adaptation framework.
arXiv Detail & Related papers (2025-05-22T03:11:35Z)
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z)
GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model [69.71629949747884]
Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data. In this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples.
arXiv Detail & Related papers (2024-01-11T16:43:16Z)
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space. Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings. The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z)
Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions. Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions. We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z)
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models. We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z)
Enhancing Targeted Attack Transferability via Diversified Weight Pruning [0.3222802562733786]
Malicious attackers can generate targeted adversarial examples by imposing human-imperceptible noise on images. With cross-model transferable adversarial examples, the vulnerability of neural networks remains even if the model information is kept secret from the attacker. Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples.
arXiv Detail & Related papers (2022-08-18T07:25:48Z)
Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models. Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
Generating Out of Distribution Adversarial Attack using Latent Space Poisoning [5.1314136039587925]
We propose a novel mechanism of generating adversarial examples where the actual image is not corrupted. latent space representation is utilized to tamper with the inherent structure of the image. As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset.
arXiv Detail & Related papers (2020-12-09T13:05:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.