Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models
- URL: http://arxiv.org/abs/2404.10335v3
- Date: Tue, 23 Jul 2024 08:33:28 GMT
- Title: Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models
- Authors: Qi Guo, Shanmin Pang, Xiaojun Jia, Yang Liu, Qing Guo,
- Abstract summary: Adversarial attacks can be used to assess the robustness of large visual-language models (VLMs)
Previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure.
We propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples.
- Score: 17.958154849014576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial attacks, particularly \textbf{targeted} transfer-based attacks, can be used to assess the adversarial robustness of large visual-language models (VLMs), allowing for a more thorough examination of potential security flaws before deployment. However, previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure. Furthermore, due to the unnaturalness of adversarial semantics, the generated adversarial examples have low transferability. These issues limit the utility of existing methods for assessing robustness. To address these issues, we propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples via score matching. Specifically, AdvDiffVLM uses Adaptive Ensemble Gradient Estimation to modify the score during the diffusion model's reverse generation process, ensuring that the produced adversarial examples have natural adversarial targeted semantics, which improves their transferability. Simultaneously, to improve the quality of adversarial examples, we use the GradCAM-guided Mask method to disperse adversarial semantics throughout the image rather than concentrating them in a single area. Finally, AdvDiffVLM embeds more target semantics into adversarial examples after multiple iterations. Experimental results show that our method generates adversarial examples 5x to 10x faster than state-of-the-art transfer-based adversarial attacks while maintaining higher quality adversarial examples. Furthermore, compared to previous transfer-based adversarial attacks, the adversarial examples generated by our method have better transferability. Notably, AdvDiffVLM can successfully attack a variety of commercial VLMs in a black-box environment, including GPT-4V.
Related papers
- Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility.
Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models [7.406040859734522]
Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.
Previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models.
We propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models.
arXiv Detail & Related papers (2023-07-24T03:10:02Z) - Mist: Towards Improved Adversarial Examples for Diffusion Models [0.8883733362171035]
Diffusion Models (DMs) have empowered great success in artificial-intelligence-generated content, especially in artwork creation.
infringers can make profits by imitating non-authorized human-created paintings with DMs.
Recent researches suggest that various adversarial examples for diffusion models can be effective tools against these copyright infringements.
arXiv Detail & Related papers (2023-05-22T03:43:34Z) - Generating Adversarial Examples with Better Transferability via Masking
Unimportant Parameters of Surrogate Model [6.737574282249396]
We propose to improve the transferability of adversarial examples in the transfer-based attack via unimportant masking parameters (MUP)
The key idea in MUP is to refine the pretrained surrogate models to boost the transfer-based attack.
arXiv Detail & Related papers (2023-04-14T03:06:43Z) - Improving Adversarial Transferability with Scheduled Step Size and Dual
Example [33.00528131208799]
We show that transferability of adversarial examples generated by the iterative fast gradient sign method exhibits a decreasing trend when increasing the number of iterations.
We propose a novel strategy, which uses the Scheduled step size and the Dual example (SD) to fully utilize the adversarial information near the benign sample.
Our proposed strategy can be easily integrated with existing adversarial attack methods for better adversarial transferability.
arXiv Detail & Related papers (2023-01-30T15:13:46Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Boosting Transferability of Targeted Adversarial Examples via
Hierarchical Generative Networks [56.96241557830253]
Transfer-based adversarial attacks can effectively evaluate model robustness in the black-box setting.
We propose a conditional generative attacking model, which can generate the adversarial examples targeted at different classes.
Our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods.
arXiv Detail & Related papers (2021-07-05T06:17:47Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Direction-Aggregated Attack for Transferable Adversarial Examples [10.208465711975242]
A deep neural network is vulnerable to adversarial examples crafted by imposing imperceptible changes to the inputs.
adversarial examples are most successful in white-box settings where the model and its parameters are available.
We propose the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples.
arXiv Detail & Related papers (2021-04-19T09:54:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.