VLSBench: Unveiling Visual Leakage in Multimodal Safety
- URL: http://arxiv.org/abs/2411.19939v2
- Date: Fri, 17 Jan 2025 09:50:55 GMT
- Title: VLSBench: Unveiling Visual Leakage in Multimodal Safety
- Authors: Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao,
- Abstract summary: Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications.
Previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs.
This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL.
- Score: 39.344623032631475
- License:
- Abstract: Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL. Please see our code and data at: https://hxhcreate.github.io/vlsbench.github.io/
Related papers
- SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings [32.661752596399204]
Multimodal Large Language Models (MLLMs) have serious security vulnerabilities.
Existing low-resource security alignment methods, including textual alignment, have been found to struggle with the security risks posed by additional modalities.
We propose Synthetic Embedding augmented safety alignment (SEA), which optimize embeddings of additional modality through gradient updates.
arXiv Detail & Related papers (2025-02-18T05:57:35Z) - Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs [56.440345471966666]
Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images.
This paper introduces MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios.
MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness.
arXiv Detail & Related papers (2025-02-16T16:12:40Z) - MSTS: A Multimodal Safety Test Suite for Vision-Language Models [35.86658426268927]
Vision-language models (VLMs) are increasingly integrated into chat assistants and other consumer AI applications.
Despite clear hazards, little work has evaluated VLM safety and the novel risks created by multimodal inputs.
M MSTS comprises 400 test prompts across 40 fine-grained hazard categories.
We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses.
arXiv Detail & Related papers (2025-01-17T09:22:35Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - TrojVLM: Backdoor Attack Against Vision Language Models [50.87239635292717]
This study introduces TrojVLM, the first exploration of backdoor attacks aimed at Vision Language Models (VLMs)
TrojVLM inserts predetermined target text into output text when encountering poisoned images.
A novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.
arXiv Detail & Related papers (2024-09-28T04:37:09Z) - CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration [90.36429361299807]
multimodal large language models (MLLMs) have demonstrated remarkable success in engaging in conversations involving visual inputs.
The integration of visual modality has introduced a unique vulnerability: the MLLM becomes susceptible to malicious visual inputs.
We introduce a technique termed CoCA, which amplifies the safety-awareness of the MLLM by calibrating its output distribution.
arXiv Detail & Related papers (2024-09-17T17:14:41Z) - AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning [15.770849688170477]
In-context learning (ICL) facilitates Large Language Models exhibiting emergent ability on downstream tasks without updating billions of parameters.
Most primary MLLMs are only trained on single-image datasets, making them unable to read multi-modal demonstrations.
We propose a general and light-weighted framework textbfAIM to tackle the mentioned problems through textbfAggregating textbfImage information of textbfMultimodal demonstrations to the dense latent space of the corresponding linguistic part.
arXiv Detail & Related papers (2024-06-11T08:12:43Z) - Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models [107.88745040504887]
We study the harmlessness alignment problem of multimodal large language models (MLLMs)
Inspired by this, we propose a novel jailbreak method named HADES, which hides and amplifies the harmfulness of the malicious intent within the text input.
Experimental results show that HADES can effectively jailbreak existing MLLMs, which achieves an average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision.
arXiv Detail & Related papers (2024-03-14T18:24:55Z) - MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models [41.708401515627784]
We observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images.
We introduce MM-SafetyBench, a framework designed for conducting safety-critical evaluations of MLLMs against such image-based manipulations.
Our work underscores the need for a concerted effort to strengthen and enhance the safety measures of open-source MLLMs against potential malicious exploits.
arXiv Detail & Related papers (2023-11-29T12:49:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.