Related papers: VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models

VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models

URL: http://arxiv.org/abs/2508.08521v1
Date: Mon, 11 Aug 2025 23:25:16 GMT
Title: VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Authors: Mansi Phute, Ravikumar Balakrishnan,
Abstract summary: VISOR (Visual Input-based Steering for Output Redirection) is a novel method that achieves sophisticated behavioral control through optimized visual inputs alone.<n>We validate VISOR on LLaVA-1.5-7B across three critical alignment tasks: refusal, sycophancy and survival instinct.<n>VISOR provides robust bidirectional control while maintaining 99.9% performance on 14,000 unrelated MMLU tasks.
Score: 1.4262180230002854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision Language Models (VLMs) are increasingly being used in a broad range of applications, bringing their security and behavioral control to the forefront. While existing approaches for behavioral control or output redirection, like system prompting in VLMs, are easily detectable and often ineffective, activation-based steering vectors require invasive runtime access to model internals--incompatible with API-based services and closed-source deployments. We introduce VISOR (Visual Input-based Steering for Output Redirection), a novel method that achieves sophisticated behavioral control through optimized visual inputs alone. By crafting universal steering images that induce target activation patterns, VISOR enables practical deployment across all VLM serving modalities while remaining imperceptible compared to explicit textual instructions. We validate VISOR on LLaVA-1.5-7B across three critical alignment tasks: refusal, sycophancy and survival instinct. A single 150KB steering image matches steering vector performance within 1-2% for positive behavioral shifts while dramatically exceeding it for negative steering--achieving up to 25% shifts from baseline compared to steering vectors' modest changes. Unlike system prompting (3-4% shifts), VISOR provides robust bidirectional control while maintaining 99.9% performance on 14,000 unrelated MMLU tasks. Beyond eliminating runtime overhead and model access requirements, VISOR exposes a critical security vulnerability: adversaries can achieve sophisticated behavioral manipulation through visual channels alone, bypassing text-based defenses. Our work fundamentally re-imagines multimodal model control and highlights the urgent need for defenses against visual steering attacks.

Related papers

CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control [39.17038025776311]
CARE is a framework designed to train VLA models for robotic task execution.<n> CARE eliminates the need for explicit action labels by leveraging only video-text pairs.<n>Results demonstrate CARE's scalability, interpretability, and effectiveness in robotic control with weak supervision.
arXiv Detail & Related papers (2026-01-30T02:28:32Z)
ReViP: Reducing False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance [50.05984919728878]
We present ReViP, a novel VLA framework with Vision-Proprioception Rebalance to enhance visual grounding and robustness under perturbations.<n>Specifically, we use an external VLM as a task-stage observer to extract real-time task-centric visual cues from visual observations.<n>To evaluate false completion, we propose the first False-Completion Benchmark Suite built on LIBERO with controlled settings such as Object-Drop.
arXiv Detail & Related papers (2026-01-23T11:31:07Z)
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems [0.0]
We propose a novel, adaptive adversarial framework designed to expose and exploit scaling vulnerabilities in production Vision-Language Models (VLMs)<n>Our experiments demonstrate that Chameleon achieves an Attack Success Rate (ASR) of 84.5% across varying scaling factors.<n>We show that these attacks effectively compromise agentic pipelines, reducing decision-making accuracy by over 45% in multi-step tasks.
arXiv Detail & Related papers (2025-12-04T15:22:28Z)
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs [66.81402538540458]
We propose V-Attack, a novel method for precise local semantic attacks.<n>V-Attack improves the attack success rate by an average of 36% over state-of-the-art methods.
arXiv Detail & Related papers (2025-11-25T11:51:17Z)
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models [4.506695482619111]
This work introduces SteerVLM, a lightweight steering module for Vision-Language Models (VLMs)<n>Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context.<n>Our steering module requires learning parameters equal to 0.14% of the original VLM's size.
arXiv Detail & Related papers (2025-10-30T17:52:39Z)
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models [2.8676122062166187]
We introduce universal visual input based steering for output redirection (VISOR++) to achieve behavioral control through optimized visual inputs alone.<n>We demonstrate that a single VISOR++ image can be generated for an ensemble of Vision Language Models (VLMs) to emulate each of their steering vectors.<n>We also show the promise of VISOR++ images in achieving directional behavioral shifts for unseen models including both open-access and closed-access ones.
arXiv Detail & Related papers (2025-09-29T21:43:18Z)
Universal Camouflage Attack on Vision-Language Models for Autonomous Driving [67.34987318443761]
Visual language modeling for automated driving is emerging as a promising research direction.<n>VLM-AD remains vulnerable to serious security threats from adversarial attacks.<n>We propose the first Universal Camouflage Attack framework for VLM-AD.
arXiv Detail & Related papers (2025-09-24T14:52:01Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments [61.808686396077036]
We present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon vision-language models (VLMs)<n>Our method manipulates only the visual inputs of a portion of the training samples without altering their corresponding labels or instructions.<n>We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use.
arXiv Detail & Related papers (2025-06-16T08:09:32Z)
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models [34.60772103760521]
We introduce a novel framework that enhances Embodied Visual Tracking (EVT) with Vision-Language Models (VLMs)<n>This work represents the first integration of VLM-based reasoning to assist EVT agents in proactive failure recovery.
arXiv Detail & Related papers (2025-05-27T04:53:50Z)
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks [16.508109544083496]
Vision Language Models (VLMs) can produce unintended and harmful content when exposed to adversarial attacks.<n>Existing defenses, such as input preprocessing, adversarial training, and response evaluation-based methods, are often impractical for real-world deployment.<n>We propose ASTRA, an efficient and effective defense by adaptively steering models away from adversarial feature directions to resist VLM attacks.
arXiv Detail & Related papers (2024-11-23T02:17:17Z)
Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers [105.89564687747134]
We propose a self-regularized AutoAugment method to learn views for self-supervised vision transformers. First, we reduce the search cost of AutoView to nearly zero by learning views and network parameters simultaneously. We also present a curated augmentation policy search space for self-supervised learning.
arXiv Detail & Related papers (2022-10-16T06:20:44Z)
Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion. We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.