Fugu-MT 論文翻訳(概要): ORBIT: Training-Free Multi-Attribute Behavioral Steering via Orthogonal Subspace Rotation

論文の概要: ORBIT: Training-Free Multi-Attribute Behavioral Steering via Orthogonal Subspace Rotation

arxiv url: http://arxiv.org/abs/2606.22357v1
Date: Sun, 21 Jun 2026 06:40:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 18:42:06.595228
Title: ORBIT: Training-Free Multi-Attribute Behavioral Steering via Orthogonal Subspace Rotation
Title（参考訳）: ORBIT: 直交部分空間回転によるトレーニングフリーマルチ属性動作ステアリング
Authors: Narges Ghasemi, Amir Ziashahabi, Salman Avestimehr, Jonathan May,
Abstract要約: 回転型ステアリングのトレーニング不要な拡張であるORBITをマルチ属性設定に導入する。本手法は, 特異値分解法により, 配位子間ステアリング平面から結合部分空間を構成する。また、表面的なスタイルではなく、行動の傾向に焦点を当てた、新しいマルチ属性ベンチマークであるTritFactoryも導入した。
参考スコア（独自算出の注目度）: 38.36944876774961
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models are widely used in assistant settings, where controlling behavioral attributes is often essential. Activation steering modifies hidden-state representations at inference time, providing a lightweight, training-free mechanism that can be toggled at runtime. Existing methods, however, have focused primarily on steering a single attribute at a time. When multiple attributes must be controlled simultaneously, naive summation of per-attribute steering vectors suffers from norm imbalance and directional cancellation, while classifier-based approaches require retraining whenever the attribute set changes. We introduce ORBIT (Orthogonal Rotation-Based Intervention Technique), a training-free extension of rotation-based steering to the multi-attribute setting. Our method constructs a joint subspace from per-attribute steering planes via singular value decomposition and applies a single norm-preserving rotation within that subspace toward a combined target direction. Adaptive per-token gating identifies which attributes need correction at each position, and an optional additive boost strengthens attributes with weak initial projection. We also introduce TraitFactory, a new multi-attribute benchmark that focuses on behavioral tendencies rather than surface-level style. We evaluate ORBIT on TraitFactory and ToneBank across three models (Llama-3.2-3B, Qwen-2.5-7B, Llama-3.1-8B) while steering multiple attributes simultaneously, showing that it achieves stronger and more balanced multi-attribute steering than existing training-free baselines while better preserving output coherence.
Abstract（参考訳）: 言語モデルはアシスタント設定で広く使われており、振る舞いの属性を制御することが不可欠であることが多い。アクティベーションステアリングは、推論時に隠れ状態表現を変更し、実行時にトグル可能な軽量でトレーニング不要なメカニズムを提供する。しかし、既存の方法は、主に1つの属性を一度に操ることに焦点を当てている。複数の属性を同時に制御する必要がある場合、属性単位のステアリングベクトルの単純和はノルム不均衡と方向のキャンセルに悩まされ、一方、分類器ベースのアプローチは属性セットが変化するたびに再訓練を必要とする。 ORBIT (Orthogonal Rotation-Based Intervention Technique) は、回転型ステアリングをマルチ属性設定に拡張する訓練自由な手法である。本手法は, 単値分解により各属性のステアリング面から結合部分空間を構築し, その部分空間内での単一のノルム保存回転を, 組み合わせた目標方向に向けて適用する。アダプティブ・パー・トケン・ゲーティング(Adaptive per-token gating)は、どの属性がそれぞれの位置で補正を必要とするかを識別し、オプションの加算は、弱い初期射影で属性を強化する。また、表面的なスタイルではなく、行動の傾向に焦点を当てた、新しいマルチ属性ベンチマークであるTritFactoryも導入した。我々は,TraitFactoryとToneBankの3つのモデル(Llama-3.2-3B,Qwen-2.5-7B,Llama-3.1-8B)でORBITを評価し,複数の属性を同時に操り,既存のトレーニング不要ベースラインよりも強くバランスの取れたマルチ属性ステアリングを実現し,出力コヒーレンスを向上した。

論文の概要: ORBIT: Training-Free Multi-Attribute Behavioral Steering via Orthogonal Subspace Rotation

関連論文リスト