AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
- URL: http://arxiv.org/abs/2307.12499v4
- Date: Sun, 14 Jul 2024 04:48:53 GMT
- Title: AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
- Authors: Xuelong Dai, Kaisheng Liang, Bin Xiao,
- Abstract summary: Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
- Score: 7.406040859734522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models, which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for GAN-based methods on large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable in generating high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective in generating unrestricted adversarial examples, which outperforms state-of-the-art unrestricted adversarial attack methods in terms of attack performance and generation quality.
Related papers
- Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models [17.958154849014576]
Adversarial attacks can be used to assess the robustness of large visual-language models (VLMs)
Previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure.
We propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples.
arXiv Detail & Related papers (2024-04-16T07:19:52Z) - Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples.
Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z) - Resisting Adversarial Attacks in Deep Neural Networks using Diverse
Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify.
We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model.
We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Improving Gradient-based Adversarial Training for Text Classification by
Contrastive Learning and Auto-Encoder [18.375585982984845]
We focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process.
We propose two novel adversarial training approaches: CARL and RAR.
Experiments show that the proposed two approaches outperform strong baselines on various text classification datasets.
arXiv Detail & Related papers (2021-09-14T09:08:58Z) - Adversarial example generation with AdaBelief Optimizer and Crop
Invariance [8.404340557720436]
Adversarial attacks can be an important method to evaluate and select robust models in safety-critical applications.
We propose AdaBelief Iterative Fast Gradient Method (ABI-FGM) and Crop-Invariant attack Method (CIM) to improve the transferability of adversarial examples.
Our method has higher success rates than state-of-the-art gradient-based attack methods.
arXiv Detail & Related papers (2021-02-07T06:00:36Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision.
Existing defenses are trend to harden the robustness of models against adversarial attacks.
We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.