Related papers: Adversarial-Guided Diffusion for Multimodal LLM Attacks

Adversarial-Guided Diffusion for Multimodal LLM Attacks

URL: http://arxiv.org/abs/2507.23202v1
Date: Thu, 31 Jul 2025 02:57:20 GMT
Title: Adversarial-Guided Diffusion for Multimodal LLM Attacks
Authors: Chengwei Xia, Fan Ma, Ruijie Quan, Kun Zhan, Yi Yang,
Abstract summary: We propose an adversarial-guided diffusion (AGD) approach for adversarial attack MLLMs.<n>AGD injects target semantics into the noise component of the reverse diffusion.<n>AGD outperforms state-of-the-art methods in attack performance as well as in model robustness to some defenses.
Score: 22.666853714543993
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses the challenge of generating adversarial image using a diffusion model to deceive multimodal large language models (MLLMs) into generating the targeted responses, while avoiding significant distortion of the clean image. To address the above challenges, we propose an adversarial-guided diffusion (AGD) approach for adversarial attack MLLMs. We introduce adversarial-guided noise to ensure attack efficacy. A key observation in our design is that, unlike most traditional adversarial attacks which embed high-frequency perturbations directly into the clean image, AGD injects target semantics into the noise component of the reverse diffusion. Since the added noise in a diffusion model spans the entire frequency spectrum, the adversarial signal embedded within it also inherits this full-spectrum property. Importantly, during reverse diffusion, the adversarial image is formed as a linear combination of the clean image and the noise. Thus, when applying defenses such as a simple low-pass filtering, which act independently on each component, the adversarial image within the noise component is less likely to be suppressed, as it is not confined to the high-frequency band. This makes AGD inherently robust to variety defenses. Extensive experiments demonstrate that our AGD outperforms state-of-the-art methods in attack performance as well as in model robustness to some defenses.

Related papers

CoDefend: Cross-Modal Collaborative Defense via Diffusion Purification and Prompt Optimization [4.6467356929461925]
Multimodal Large Language Models (MLLMs) have achieved remarkable success in tasks such as image captioning, visual question answering, and cross-modal reasoning.<n>Their multimodal nature exposes them to adversarial threats, where attackers can perturb either modality or both jointly to induce harmful, misleading, or policy violating outputs.<n>Existing defense strategies, such as adversarial training and input purification, face notable limitations.<n>We propose a supervised diffusion based denoising framework that leverages paired adversarial clean image datasets to fine-tune diffusion models.
arXiv Detail & Related papers (2025-10-13T07:44:54Z)
Active Adversarial Noise Suppression for Image Forgery Localization [56.98050814363447]
We introduce an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise.<n>To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks.
arXiv Detail & Related papers (2025-06-15T14:53:27Z)
DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models [45.126261544696185]
Vision Language Models (VLMs) have shown remarkable capabilities in multimodal understanding, yet their susceptibility to perturbations poses a significant threat to their reliability in real-world applications.<n>This paper introduces DiffCAP, a novel diffusion-based purification strategy that can effectively neutralize adversarial corruptions in VLMs.
arXiv Detail & Related papers (2025-06-04T13:26:33Z)
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification [75.09791002021947]
Existing purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples.<n>This approach is fundamentally flawed as the uniform operation of the forward process compromises normal pixels while attempting to combat adversarial perturbations.<n>We propose a heterogeneous purification strategy grounded in the interpretability of neural networks.<n>Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise.
arXiv Detail & Related papers (2025-03-03T11:00:25Z)
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing [93.45507533317405]
DiffusionGuard is a robust and effective defense method against unauthorized edits by diffusion-based image editing models. We introduce a novel objective that generates adversarial noise targeting the early stage of the diffusion process. We also introduce a mask-augmentation technique to enhance robustness against various masks during test time.
arXiv Detail & Related papers (2024-10-08T05:19:19Z)
Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models [9.905296922309157]
Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them.<n>Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations.<n>Our work proposes a novel attack framework, AtkPDM, which exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images.
arXiv Detail & Related papers (2024-08-21T17:56:34Z)
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples. It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z)
Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models [17.958154849014576]
Adversarial attacks can be used to assess the robustness of large visual-language models (VLMs)<n>Previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure.<n>We propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples.
arXiv Detail & Related papers (2024-04-16T07:19:52Z)
LFAA: Crafting Transferable Targeted Adversarial Examples with Low-Frequency Perturbations [25.929492841042666]
We present a novel approach to generate transferable targeted adversarial examples. We exploit the vulnerability of deep neural networks to perturbations on high-frequency components of images. Our proposed approach significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-10-31T04:54:55Z)
Robust Real-World Image Super-Resolution against Adversarial Attacks [115.04009271192211]
adversarial image samples with quasi-imperceptible noises could threaten deep learning SR models. We propose a robust deep learning framework for real-world SR that randomly erases potential adversarial noises. Our proposed method is more insensitive to adversarial attacks and presents more stable SR results than existing models and defenses.
arXiv Detail & Related papers (2022-07-31T13:26:33Z)
Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks. We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP) On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.