Related papers: SIA: Enhancing Safety via Intent Awareness for Vision-Language Models

SIA: Enhancing Safety via Intent Awareness for Vision-Language Models

URL: http://arxiv.org/abs/2507.16856v2
Date: Mon, 06 Oct 2025 10:16:31 GMT
Title: SIA: Enhancing Safety via Intent Awareness for Vision-Language Models
Authors: Youngjin Na, Sangheon Jeong, Youngwan Lee, Jian Lee, Dawoon Jeong, Youngman Kim,
Abstract summary: multimodal inputs can combine to reveal harmful intent, leading to unsafe model outputs.<n>We propose SIA (Safety via Intent Awareness), a training-free, intent-aware safety framework.<n>SIA proactively detects harmful intent in multimodal inputs and uses it to guide the generation of safe responses.
Score: 9.208512612467029
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the growing deployment of Vision-Language Models (VLMs) in real-world applications, previously overlooked safety risks are becoming increasingly evident. In particular, seemingly innocuous multimodal inputs can combine to reveal harmful intent, leading to unsafe model outputs. While multimodal safety has received increasing attention, existing approaches often fail to address such latent risks, especially when harmfulness arises only from the interaction between modalities. We propose SIA (Safety via Intent Awareness), a training-free, intent-aware safety framework that proactively detects harmful intent in multimodal inputs and uses it to guide the generation of safe responses. SIA follows a three-stage process: (1) visual abstraction via captioning; (2) intent inference through few-shot chain-of-thought (CoT) prompting; and (3) intent-conditioned response generation. By dynamically adapting to the implicit intent inferred from an image-text pair, SIA mitigates harmful outputs without extensive retraining. Extensive experiments on safety benchmarks, including SIUO, MM-SafetyBench, and HoliSafe, show that SIA consistently improves safety and outperforms prior training-free methods.

Related papers

Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility [26.564913442069866]
Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings.<n>Existing defenses rely on safety fine-tuning or aggressive token manipulations, incurring substantial training costs or significantly degrading utility.<n>We propose Risk Awareness Injection (RAI), a lightweight and training-free framework for safety calibration.
arXiv Detail & Related papers (2026-02-03T11:26:05Z)
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic [56.38397499463889]
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks.<n>However, they remain vulnerable to hazardous instructions that may trigger unsafe behaviors.<n>We propose RoboSafe, a runtime safeguard for embodied agents through executable predicate-based safety logic.
arXiv Detail & Related papers (2025-12-24T15:01:26Z)
Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models [46.51961683413124]
This paper introduces the concept of Implicit Reasoning Safety, a vulnerability in LVLMs.<n>Our demonstrations show that even simple In-Context Learning with SSUI significantly mitigates these implicit multimodal threats.
arXiv Detail & Related papers (2025-08-12T13:26:06Z)
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models [21.961325147038867]
Large vision-language models (LVLMs) are vulnerable to harmful input compared to their language-only backbones.<n>We define these capabilities as safety perception, semantic understanding, and alignment for linguistic expression.<n>Motivated by these findings, we propose textbfSelf-Aware Safety Augmentation (SASA), a technique that projects informative semantic representations onto earlier safety-oriented layers.
arXiv Detail & Related papers (2025-07-29T09:48:57Z)
Automating Steering for Safe Multimodal Large Language Models [58.36932318051907]
We introduce a modular and adaptive inference-time intervention technology, AutoSteer, without requiring any fine-tuning of the underlying model.<n>AutoSteer incorporates three core components: (1) a novel Safety Awareness Score (SAS) that automatically identifies the most safety-relevant distinctions among the model's internal layers; (2) an adaptive safety prober trained to estimate the likelihood of toxic outputs from intermediate representations; and (3) a lightweight Refusal Head that selectively intervenes to modulate generation when safety risks are detected.
arXiv Detail & Related papers (2025-07-17T16:04:55Z)
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions [64.85086226439954]
We present SAFE, a benchmark for assessing the safety of embodied VLM agents on hazardous instructions.<n> SAFE comprises three components: SAFE-THOR, SAFE-VERSE, and SAFE-DIAGNOSE.<n>We uncover systematic failures in translating hazard recognition into safe planning and execution.
arXiv Detail & Related papers (2025-06-17T16:37:35Z)
The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models [4.27794555931853]
Vision-Language Models (VLMs) face unique vulnerabilities due to their multimodal nature, allowing adversaries to bypass safety guardrails and trigger the generation of harmful content.<n>We propose The Safety Reminder'', a soft prompt tuning approach that optimize learnable prompt tokens, which are periodically injected during the text generation process to enhance safety awareness.
arXiv Detail & Related papers (2025-06-15T12:48:38Z)
HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model [52.72318433518926]
Existing safety-tuning datasets and benchmarks only partially consider how image-text interactions can yield harmful content.<n>We introduce a holistic safety dataset and benchmark, HoliSafe, that spans all five safe/unsafe image-text combinations.<n>We propose SafeLLaVA, a novel VLM augmented with a learnable safety meta token and a dedicated safety head.
arXiv Detail & Related papers (2025-06-05T07:26:34Z)
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack [7.988475248750045]
Large Vision-Language Models (LVLMs) have shown remarkable capabilities across a wide range of multimodal tasks.<n>We conduct a systematic representational analysis to uncover why conventional adversarial attacks can circumvent the safety mechanisms embedded in LVLMs.<n>We propose a novel two stage evaluation framework for adversarial attacks on LVLMs.
arXiv Detail & Related papers (2025-05-28T04:43:39Z)
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning [76.56522719330911]
Large Reasoning Models (LRMs) introduce a new generation paradigm of explicitly reasoning before answering.<n>LRMs pose great safety risks against harmful queries and adversarial attacks.<n>We propose SafeKey to better activate the safety aha moment in the key sentence.
arXiv Detail & Related papers (2025-05-22T03:46:03Z)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs [56.440345471966666]
Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images.<n>This paper introduces MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios.<n> MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness.
arXiv Detail & Related papers (2025-02-16T16:12:40Z)
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models [25.606641582511106]
We propose a novel dataset that integrates multi-image inputs with safety Chain-of-Thought (CoT) labels as fine-grained reasoning logic to improve model performance.<n>Our experiments demonstrate that fine-tuning InternVL2.5-8B with MIS significantly outperforms both powerful open-source models and API-based models in challenging multi-image tasks.
arXiv Detail & Related papers (2025-01-30T17:59:45Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.<n>For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.<n>We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment.<n>To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations.<n>Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z)
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models? [52.238883592674696]
Ring-A-Bell is a model-agnostic red-teaming tool for T2I diffusion models. It identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms.
arXiv Detail & Related papers (2023-10-16T02:11:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.