Fugu-MT 論文翻訳(概要): Compositional Steering of Large Language Models with Steering Tokens

論文の概要: Compositional Steering of Large Language Models with Steering Tokens

arxiv url: http://arxiv.org/abs/2601.05062v1
Date: Thu, 08 Jan 2026 16:08:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 17:01:53.269411
Title: Compositional Steering of Large Language Models with Steering Tokens
Title（参考訳）: ステアリングトークンを用いた大規模言語モデルの合成ステアリング
Authors: Gorjan Radevski, Kiril Gashteovski, Giwon Hong, Carolin Lawrence, Goran Glavaš,
Abstract要約: マルチビヘイビアステアリングのためのテキストコンポジションステアリングトークンを提案する。まず、自然言語命令として表現された個々の振る舞いを、自己蒸留によって専用のトークンに埋め込む。ステアリングトークンは、競合するアプローチと比較して、より優れたマルチ行動制御をもたらすことを示す。
参考スコア（独自算出の注目度）: 18.117668235084537
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying LLMs in real-world applications requires controllable output that satisfies multiple desiderata at the same time. While existing work extensively addresses LLM steering for a single behavior, \textit{compositional steering} -- i.e., steering LLMs simultaneously towards multiple behaviors -- remains an underexplored problem. In this work, we propose \emph{compositional steering tokens} for multi-behavior steering. We first embed individual behaviors, expressed as natural language instructions, into dedicated tokens via self-distillation. Contrary to most prior work, which operates in the activation space, our behavior steers live in the space of input tokens, enabling more effective zero-shot composition. We then train a dedicated \textit{composition token} on pairs of behaviors and show that it successfully captures the notion of composition: it generalizes well to \textit{unseen} compositions, including those with unseen behaviors as well as those with an unseen \textit{number} of behaviors. Our experiments across different LLM architectures show that steering tokens lead to superior multi-behavior control compared to competing approaches (instructions, activation steering, and LoRA merging). Moreover, we show that steering tokens complement natural language instructions, with their combination resulting in further gains.
Abstract（参考訳）: 実世界のアプリケーションにLLMをデプロイするには、複数のデシラタを同時に満たす制御可能な出力が必要である。既存の研究は単一動作に対するLLMのステアリングを広く扱うが、'textit{compositional steering} -- つまり、複数の動作に対して同時にLLMをステアリングする -- は未解決の問題である。本研究では,多行動ステアリングのためのemph{compositional steering tokens}を提案する。まず、自然言語命令として表現された個々の振る舞いを、自己蒸留によって専用のトークンに埋め込む。アクティベーション空間で動作するほとんどの以前の作業とは対照的に、私たちの行動ステアは入力トークンの空間に存在し、より効果的なゼロショット合成を可能にします。次に、振る舞いのペアに専用の \textit{composition token} をトレーニングし、それが構成の概念をうまく捉えていることを示します。異なるLCMアーキテクチャを対象とした実験により, ステアリングトークンは, 競合するアプローチ(命令, アクティベーションステアリング, LoRAマージ)と比較して, 優れたマルチビヘイビア制御をもたらすことが示された。さらに, ステアリングトークンが自然言語命令を補完することを示す。

論文の概要: Compositional Steering of Large Language Models with Steering Tokens

関連論文リスト