Related papers: Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM

Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM

URL: http://arxiv.org/abs/2507.20994v1
Date: Mon, 28 Jul 2025 16:59:53 GMT
Title: Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM
Authors: Shen Li, Liuyi Yao, Wujia Niu, Lan Zhang, Yaliang Li,
Abstract summary: Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs.<n>We introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality.
Score: 40.83149588857177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs. However, the safety mechanisms developed for text-based LLMs do not naturally extend to visual modalities, leaving LVLMs vulnerable to harmful image inputs. To address this cross-modal safety gap, we introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality. These tensors transfer textual safety alignment to visual processing without modifying the model's parameters. They are optimized using a curated dataset containing (i) malicious image-text pairs requiring rejection, (ii) contrastive benign pairs with text structurally similar to malicious queries, with the purpose of being contrastive examples to guide visual reliance, and (iii) general benign samples preserving model functionality. Experimental results demonstrate that both textual and visual security tensors significantly enhance LVLMs' ability to reject diverse harmful visual inputs while maintaining near-identical performance on benign tasks. Further internal analysis towards hidden-layer representations reveals that security tensors successfully activate the language module's textual "safety layers" in visual inputs, thereby effectively extending text-based safety to the visual modality.

Related papers

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models [0.0]
Existing evaluation datasets lean towards text-only prompts, leaving visual vulnerabilities under evaluated.<n>We propose Text2VLM, a novel multi-stage pipeline that adapts text-only datasets into multimodal formats.<n>Text2VLM provides a scalable tool for comprehensive safety assessment, contributing to the development of more robust safety mechanisms for Visual Language Models.
arXiv Detail & Related papers (2025-07-28T10:57:44Z)
Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment [38.04426918886084]
We introduce VisionDrop, a training-free, visual-only pruning framework that selects informative visual tokens based on intra-modal (visual-to-visual) attention.<n>Our method performs dominant token selection and lightweight contextual merging at multiple stages, enabling fine-grained visual information to be retained even under aggressive token budgets.
arXiv Detail & Related papers (2025-06-27T14:55:40Z)
Robustifying Vision-Language Models via Dynamic Token Reweighting [28.675118345987887]
Large vision-language models (VLMs) are highly vulnerable to jailbreak attacks.<n>We present a novel inference-time defense that mitigates multimodal jailbreak attacks.<n>We introduce a new formulation of the safety-relevant distributional shift induced by the visual modality.
arXiv Detail & Related papers (2025-05-22T03:00:39Z)
Transferable Adversarial Attacks on Black-Box Vision-Language Models [63.22532779621001]
adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts.<n>We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information.<n>We discover that universal perturbations -- modifications applicable to a wide set of images -- can consistently induce these misinterpretations.
arXiv Detail & Related papers (2025-05-02T06:51:11Z)
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? [83.53005932513155]
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited.<n>We propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences.
arXiv Detail & Related papers (2025-04-14T09:03:51Z)
Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models [0.0]
Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by combining visual and text data. Attackers can manipulate either the visual or text inputs, or both, to make the model produce unintended or even harmful responses. This paper reviews how visual inputs in MLLMs can be exploited by various attack strategies.
arXiv Detail & Related papers (2024-11-07T16:21:18Z)
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models [72.75669790569629]
Vision-language alignment in Large Vision-Language Models (LVLMs) successfully enables LLMs to understand visual input.<n>We find that existing vision-language alignment methods fail to transfer the existing safety mechanism for text in LLMs to vision.<n>We propose a novel Text-Guided vision-language alignment method (TGA) for LVLMs.
arXiv Detail & Related papers (2024-10-16T15:20:08Z)
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models [15.029014337718849]
Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities. In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively. We propose a non-targeted attack method referred to as VT-Attack, which constructs adversarial examples from multiple perspectives.
arXiv Detail & Related papers (2024-10-09T09:06:56Z)
TrojVLM: Backdoor Attack Against Vision Language Models [50.87239635292717]
This study introduces TrojVLM, the first exploration of backdoor attacks aimed at Vision Language Models (VLMs) TrojVLM inserts predetermined target text into output text when encountering poisoned images. A novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.
arXiv Detail & Related papers (2024-09-28T04:37:09Z)
Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs) We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.