Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety   to Vision in LVLM
        - URL: http://arxiv.org/abs/2507.20994v1
 - Date: Mon, 28 Jul 2025 16:59:53 GMT
 - Title: Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety   to Vision in LVLM
 - Authors: Shen Li, Liuyi Yao, Wujia Niu, Lan Zhang, Yaliang Li, 
 - Abstract summary: Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs.<n>We introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality.
 - Score: 40.83149588857177
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Large visual-language models (LVLMs) integrate aligned large language models (LLMs) with visual modules to process multimodal inputs. However, the safety mechanisms developed for text-based LLMs do not naturally extend to visual modalities, leaving LVLMs vulnerable to harmful image inputs. To address this cross-modal safety gap, we introduce security tensors - trainable input vectors applied during inference through either the textual or visual modality. These tensors transfer textual safety alignment to visual processing without modifying the model's parameters. They are optimized using a curated dataset containing (i) malicious image-text pairs requiring rejection, (ii) contrastive benign pairs with text structurally similar to malicious queries, with the purpose of being contrastive examples to guide visual reliance, and (iii) general benign samples preserving model functionality. Experimental results demonstrate that both textual and visual security tensors significantly enhance LVLMs' ability to reject diverse harmful visual inputs while maintaining near-identical performance on benign tasks. Further internal analysis towards hidden-layer representations reveals that security tensors successfully activate the language module's textual "safety layers" in visual inputs, thereby effectively extending text-based safety to the visual modality. 
 
       
      
        Related papers
        - Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in   Visual Language Models [0.0]
Existing evaluation datasets lean towards text-only prompts, leaving visual vulnerabilities under evaluated.<n>We propose Text2VLM, a novel multi-stage pipeline that adapts text-only datasets into multimodal formats.<n>Text2VLM provides a scalable tool for comprehensive safety assessment, contributing to the development of more robust safety mechanisms for Visual Language Models.
arXiv  Detail & Related papers  (2025-07-28T10:57:44Z) - Rethinking Visual Token Reduction in LVLMs under Cross-modal   Misalignment [38.04426918886084]
We introduce VisionDrop, a training-free, visual-only pruning framework that selects informative visual tokens based on intra-modal (visual-to-visual) attention.<n>Our method performs dominant token selection and lightweight contextual merging at multiple stages, enabling fine-grained visual information to be retained even under aggressive token budgets.
arXiv  Detail & Related papers  (2025-06-27T14:55:40Z) - Robustifying Vision-Language Models via Dynamic Token Reweighting [28.675118345987887]
Large vision-language models (VLMs) are highly vulnerable to jailbreak attacks.<n>We present a novel inference-time defense that mitigates multimodal jailbreak attacks.<n>We introduce a new formulation of the safety-relevant distributional shift induced by the visual modality.
arXiv  Detail & Related papers  (2025-05-22T03:00:39Z) - Transferable Adversarial Attacks on Black-Box Vision-Language Models [63.22532779621001]
adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts.<n>We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information.<n>We discover that universal perturbations -- modifications applicable to a wide set of images -- can consistently induce these misinterpretations.
arXiv  Detail & Related papers  (2025-05-02T06:51:11Z) - Do We Really Need Curated Malicious Data for Safety Alignment in   Multi-modal Large Language Models? [83.53005932513155]
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited.<n>We propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences.
arXiv  Detail & Related papers  (2025-04-14T09:03:51Z) - Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal   Language Models [0.0]
Multi-Modal Language Models (MLLMs) have transformed artificial intelligence by combining visual and text data.
Attackers can manipulate either the visual or text inputs, or both, to make the model produce unintended or even harmful responses.
This paper reviews how visual inputs in MLLMs can be exploited by various attack strategies.
arXiv  Detail & Related papers  (2024-11-07T16:21:18Z) - Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models [72.75669790569629]
Vision-language alignment in Large Vision-Language Models (LVLMs) successfully enables LLMs to understand visual input.<n>We find that existing vision-language alignment methods fail to transfer the existing safety mechanism for text in LLMs to vision.<n>We propose a novel Text-Guided vision-language alignment method (TGA) for LVLMs.
arXiv  Detail & Related papers  (2024-10-16T15:20:08Z) - Break the Visual Perception: Adversarial Attacks Targeting Encoded   Visual Tokens of Large Vision-Language Models [15.029014337718849]
Large vision-language models (LVLMs) integrate visual information into large language models, showcasing remarkable multi-modal conversational capabilities.
In general, LVLMs rely on vision encoders to transform images into visual tokens, which are crucial for the language models to perceive image contents effectively.
We propose a non-targeted attack method referred to as VT-Attack, which constructs adversarial examples from multiple perspectives.
arXiv  Detail & Related papers  (2024-10-09T09:06:56Z) - TrojVLM: Backdoor Attack Against Vision Language Models [50.87239635292717]
This study introduces TrojVLM, the first exploration of backdoor attacks aimed at Vision Language Models (VLMs)
TrojVLM inserts predetermined target text into output text when encountering poisoned images.
A novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.
arXiv  Detail & Related papers  (2024-09-28T04:37:09Z) - Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs)
We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv  Detail & Related papers  (2023-11-19T07:47:43Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.