Fugu-MT 論文翻訳(概要): VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models

論文の概要: VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models

arxiv url: http://arxiv.org/abs/2508.08521v1
Date: Mon, 11 Aug 2025 23:25:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-13 21:07:34.253266
Title: VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Title（参考訳）: VISOR:視覚言語モデルにおける出力リダイレクトのための視覚入力ベースのステアリング
Authors: Mansi Phute, Ravikumar Balakrishnan,
Abstract要約: VISOR(Visual Input-based Steering for Output Redirection)は、最適化された視覚入力のみで高度な動作制御を実現する新しい手法である。我々は,LLaVA-1.5-7B上のVISORを,拒絶,梅毒,生存本能の3つの重要なアライメントタスクで検証した。 VISORは、14,000の無関係なMMLUタスクに対して99.9%のパフォーマンスを維持しながら、堅牢な双方向制御を提供する。
参考スコア（独自算出の注目度）: 1.4262180230002854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision Language Models (VLMs) are increasingly being used in a broad range of applications, bringing their security and behavioral control to the forefront. While existing approaches for behavioral control or output redirection, like system prompting in VLMs, are easily detectable and often ineffective, activation-based steering vectors require invasive runtime access to model internals--incompatible with API-based services and closed-source deployments. We introduce VISOR (Visual Input-based Steering for Output Redirection), a novel method that achieves sophisticated behavioral control through optimized visual inputs alone. By crafting universal steering images that induce target activation patterns, VISOR enables practical deployment across all VLM serving modalities while remaining imperceptible compared to explicit textual instructions. We validate VISOR on LLaVA-1.5-7B across three critical alignment tasks: refusal, sycophancy and survival instinct. A single 150KB steering image matches steering vector performance within 1-2% for positive behavioral shifts while dramatically exceeding it for negative steering--achieving up to 25% shifts from baseline compared to steering vectors' modest changes. Unlike system prompting (3-4% shifts), VISOR provides robust bidirectional control while maintaining 99.9% performance on 14,000 unrelated MMLU tasks. Beyond eliminating runtime overhead and model access requirements, VISOR exposes a critical security vulnerability: adversaries can achieve sophisticated behavioral manipulation through visual channels alone, bypassing text-based defenses. Our work fundamentally re-imagines multimodal model control and highlights the urgent need for defenses against visual steering attacks.
Abstract（参考訳）: ビジョン言語モデル(VLM)は、そのセキュリティと行動制御を前面に置いて、幅広いアプリケーションでますます使われています。 VLMのシステムプロンプトのような、行動制御や出力のリダイレクトのための既存のアプローチは、容易に検出可能で、しばしば非効率であるが、アクティベーションベースのステアリングベクタは、APIベースのサービスやクローズドソースデプロイメントと互換性のない、モデル内部への侵入ランタイムアクセスを必要とする。本稿では,視覚入力のみを最適化することで,高度な動作制御を実現する新しい手法であるVISOR(Visual Input-based Steering for Output Redirection)を紹介する。ターゲットのアクティベーションパターンを誘導するユニバーサルステアリングイメージを作成することで、VISORは明示的なテキスト命令に比べて知覚不可能なまま、すべてのVLMサービスモダリティを実践的に展開することができる。我々は,LLaVA-1.5-7B上のVISORを,拒絶,梅毒,生存本能の3つの重要なアライメントタスクで検証した。単一の150KBのステアリング画像は、正の行動シフトに対して1-2%の範囲でステアリングベクトルのパフォーマンスと一致し、負のステアリングに対して劇的に上回り、ステアリングベクトルのモデスト変化に比べてベースラインから最大25%のシフトを達成した。システムプロンプト(3-4%シフト)とは異なり、VISORは14,000の無関係なMMLUタスクに対して99.9%のパフォーマンスを維持しながら、堅牢な双方向制御を提供する。 VISORは、実行時のオーバーヘッドとモデルアクセス要求をなくすだけでなく、重要なセキュリティ上の脆弱性を露呈している。我々の研究は、マルチモーダルモデル制御を根本的に再定義し、視覚的ステアリング攻撃に対する防衛の緊急の必要性を強調している。

論文の概要: VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models

関連論文リスト