Related papers: Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

URL: http://arxiv.org/abs/2507.02844v2
Date: Tue, 16 Sep 2025 06:48:12 GMT
Title: Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
Authors: Ziqi Miao, Yi Ding, Lijun Li, Jing Shao,
Abstract summary: multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications.<n>Security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments.<n>We propose the VisCo (Visual Contextual) Attack, where visual information serves as a necessary component in constructing a vision-centric jailbreak context.
Score: 31.1604742796343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the emergence of strong vision language capabilities, multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by encoding harmful textual semantics directly into visual inputs. However, in these approaches, the visual modality primarily serves as a trigger for unsafe behavior, often exhibiting semantic ambiguity and lacking grounding in realistic scenarios. In this work, we define a novel setting: vision-centric jailbreak, where visual information serves as a necessary component in constructing a complete and realistic jailbreak context. Building on this setting, we propose the VisCo (Visual Contextual) Attack. VisCo fabricates contextual dialogue using four distinct vision-focused strategies, dynamically generating auxiliary images when necessary to construct a vision-centric jailbreak scenario. To maximize attack effectiveness, it incorporates automatic toxicity obfuscation and semantic refinement to produce a final attack prompt that reliably triggers harmful responses from the target black-box MLLMs. Specifically, VisCo achieves a toxicity score of 4.78 and an Attack Success Rate (ASR) of 85% on MM-SafetyBench against GPT-4o, significantly outperforming the baseline, which achieves a toxicity score of 2.48 and an ASR of 22.2%. Code: https://github.com/Dtc7w3PQ/Visco-Attack.

Related papers

VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack [40.68344330540352]
Multimodal Large Language Models (MLLMs) are widely used in various fields due to their powerful cross-modal comprehension and generation capabilities.<n>Previous jailbreak attacks try to explore reasoning safety risk in text modal, while similar threats have been largely overlooked in the visual modal.<n>We propose Visual Reasoning Sequential Attack (VRSA), which induces MLLMs to gradually externalize and aggregate complete harmful intent.
arXiv Detail & Related papers (2025-12-05T16:29:52Z)
Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities [34.64588827428617]
We propose a new image-centric attack method, Contextual Image Attack (CIA)<n>CIA embeds harmful queries into seemingly benign visual contexts using four distinct visualization strategies.<n>Our method significantly outperforms prior work, demonstrating that the visual modality itself is a potent vector for jailbreaking advanced MLLMs.
arXiv Detail & Related papers (2025-12-02T17:51:02Z)
Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models [54.61181161508336]
We introduce Multi-Faceted Attack (MFA), a framework that exposes general safety vulnerabilities in leading defense-equipped Vision-Language Models (VLMs)<n>The core component of MFA is the Attention-Transfer Attack (ATA), which hides harmful instructions inside a meta task with competing objectives.<n>MFA achieves a 58.5% success rate and consistently outperforms existing methods.
arXiv Detail & Related papers (2025-11-20T07:12:54Z)
JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering [73.962469626788]
Jailbreak attacks against multimodal large language Models (MLLMs) are a significant research focus.<n>We propose JPS, underlineJailbreak MLLMs with collaborative visual underlinePerturbation and textual underlineSteering.
arXiv Detail & Related papers (2025-08-07T07:14:01Z)
Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM [40.83149588857177]
Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs.<n>We introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality.
arXiv Detail & Related papers (2025-07-28T16:59:53Z)
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models.<n>It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines.<n>We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z)
Distraction is All You Need for Multimodal Large Language Model Jailbreaking [14.787247403225294]
We propose Contrasting Subimage Distraction Jailbreaking (CS-DJ) to disrupt MLLMs alignment through multi-level distraction strategies.<n>CS-DJ achieves average success rates of 52.40% for the attack success rate and 74.10% for the ensemble attack success rate.<n>These results reveal the potential of distraction-based approaches to exploit and bypass MLLMs' defenses.
arXiv Detail & Related papers (2025-02-15T13:25:12Z)
Retention Score: Quantifying Jailbreak Risks for Vision Language Models [60.48306899271866]
Vision-Language Models (VLMs) are integrated with Large Language Models (LLMs) to enhance multi-modal machine learning capabilities.<n>This paper aims to assess the resilience of VLMs against jailbreak attacks that can compromise model safety compliance and result in harmful outputs.<n>To evaluate a VLM's ability to maintain its robustness against adversarial input perturbations, we propose a novel metric called the textbfRetention Score.
arXiv Detail & Related papers (2024-12-23T13:05:51Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt [60.54666043358946]
This paper introduces the Bi-Modal Adversarial Prompt Attack (BAP), which executes jailbreaks by optimizing textual and visual prompts cohesively. In particular, we utilize a large language model to analyze jailbreak failures and employ chain-of-thought reasoning to refine textual prompts.
arXiv Detail & Related papers (2024-06-06T13:00:42Z)
White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input. An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z)
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors [31.383591942592467]
Vision-language models (VLMs) offer innovative ways to combine visual and textual data for enhanced understanding and interaction. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications. We introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, to protectVLMs from the threat of patched visual prompt injectors.
arXiv Detail & Related papers (2024-05-17T04:19:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.