Fugu-MT 論文翻訳(概要): Learning to Steer: Input-dependent Steering for Multimodal LLMs

論文の概要: Learning to Steer: Input-dependent Steering for Multimodal LLMs

arxiv url: http://arxiv.org/abs/2508.12815v1
Date: Mon, 18 Aug 2025 10:53:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.249727
Title: Learning to Steer: Input-dependent Steering for Multimodal LLMs
Title（参考訳）: ステアリングの学習:マルチモーダルLLMにおける入力依存ステアリング
Authors: Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Arnaud Dapogny, Alasdair Newson, Matthieu Cord,
Abstract要約: 入力固有線形シフトを用いたきめ細かいステアリングについて検討する。我々は、入力固有のステアリングベクトルを予測するために、小さな補助モジュールを訓練する。我々のアプローチはL2S(Learn-to-Steer)と呼ばれ、幻覚を減らし、MLLMの安全性を向上し、他の静的ベースラインよりも優れていることを示す。
参考スコア（独自算出の注目度）: 55.53189631272456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Steering has emerged as a practical approach to enable post-hoc guidance of LLMs towards enforcing a specific behavior. However, it remains largely underexplored for multimodal LLMs (MLLMs); furthermore, existing steering techniques, such as mean steering, rely on a single steering vector, applied independently of the input query. This paradigm faces limitations when the desired behavior is dependent on the example at hand. For example, a safe answer may consist in abstaining from answering when asked for an illegal activity, or may point to external resources or consultation with an expert when asked about medical advice. In this paper, we investigate a fine-grained steering that uses an input-specific linear shift. This shift is computed using contrastive input-specific prompting. However, the input-specific prompts required for this approach are not known at test time. Therefore, we propose to train a small auxiliary module to predict the input-specific steering vector. Our approach, dubbed as L2S (Learn-to-Steer), demonstrates that it reduces hallucinations and enforces safety in MLLMs, outperforming other static baselines.
Abstract（参考訳）: ステアリングは、特定の振る舞いを強制するためのLLMのポストホックガイダンスを可能にするための実践的なアプローチとして登場した。しかし、MLLM(Multimodal LLMs)には未熟であり、また、平均ステアリングのような既存のステアリング技術は入力クエリから独立して適用される単一のステアリングベクトルに依存している。このパラダイムは、望ましい振る舞いが手元にある例に依存する場合に制限に直面します。例えば、安全な答えは、違法な活動を求めるときの回答を棄却すること、あるいは医学的アドバイスを求めるときの外部のリソースや専門家との相談に言及することである。本稿では,入力固有線形シフトを用いたきめ細かいステアリングについて検討する。このシフトは、コントラッシブな入力固有プロンプトを用いて計算される。しかし、このアプローチに必要な入力固有のプロンプトは、テスト時点では知られていない。そこで本研究では,入力固有のステアリングベクトルを予測するために,小さな補助モジュールを訓練することを提案する。我々のアプローチはL2S(Learn-to-Steer)と呼ばれ、幻覚を減らし、MLLMの安全性を向上し、他の静的ベースラインよりも優れていることを示す。

論文の概要: Learning to Steer: Input-dependent Steering for Multimodal LLMs

関連論文リスト