Fugu-MT 論文翻訳(概要): Decomposing how prompting steers behavior

論文の概要: Decomposing how prompting steers behavior

arxiv url: http://arxiv.org/abs/2606.03093v1
Date: Tue, 02 Jun 2026 03:27:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.729948
Title: Decomposing how prompting steers behavior
Title（参考訳）: ステアリングの仕方
Authors: Fan L. Cheng, Nikolaus Kriegeskorte,
Abstract要約: 我々は、プロンプトによる表現変化を解釈可能な幾何学的成分に分解する。我々のフレームワークは、モデルがタスク関連構造をルートして、プロンプト駆動の振る舞いを生成する方法を明らかにする。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompting steers large language models (LLMs) and vision-language models (VLMs) without weight updates, but it remains unclear how instruction changes reshape internal representations to produce behavior. We introduce a nested geometric decomposition framework that treats prompting as a transformation of the representational geometry of the content following the prompt. For each prompt pair, we align representations of the same stimuli under two prompts using increasingly expressive stimulus-invariant maps: translation, rigid transformation with uniform scaling, sequential axis scaling, affine transformation, and nonlinear transformation. We then causally test each map by replacing a single layer's prompt-A hidden state for held-out stimuli with its mapped counterpart and measuring recovery of prompt-B representational geometry and behavior. Across three LLMs, three VLMs, and six text or image datasets spanning style, emotion, scene content, and number, prompts consistently reshape representations toward the instructed task structure. Cross-validated variance decomposition shows that much prompt-induced activation change is captured by shape-preserving maps, especially translation and rigid transformation with uniform scaling, while tier profiles reveal model- and task-specific routing strategies across layers. Crucially, although translation and rigid tiers already improve behavioral agreement, affine transformation is the first tier to nearly recover target-prompt task geometry and yields corresponding behavioral gains. This suggests that cross-dimensional linear mixing is a key mechanism by which prompts reorganize representations toward instructed task structure. Our framework decomposes prompt-induced representational change into interpretable geometric components and reveals how models route task-relevant structure to produce prompt-driven behavior.
Abstract（参考訳）: Prompting steers large language model (LLMs) と Vision-Language model (VLMs) は重み付けを伴わないが、どのように命令が内部表現を変換して振る舞いを生成するかは定かではない。本稿では,プロンプトに続くコンテンツの表現幾何学の変換としてプロンプトを扱うネスト型幾何学的分解フレームワークを提案する。各プロンプトペアに対して、より表現力のある刺激不変写像を用いて、同じ刺激の表現を2つのプロンプトの下に整列させる: 変換、一様スケーリングによる剛性変換、逐次軸スケーリング、アフィン変換、非線形変換。次に,各写像の因果的検証を行い,一層のプロンプト-A隠れ状態と,そのマップ化された刺激とを置換し,プロンプト-B表現幾何学と挙動の回復を計測した。 3つのLDM、3つのVLM、スタイル、感情、シーン内容、数字にまたがる6つのテキストまたはイメージデータセットは、指示されたタスク構造に対して一貫して表現を再構築する。クロスバリデード分散分解は、形状保存マップ、特に一様スケーリングによる変換と剛性変換によって、多くのアクティベーション変化が捕捉され、階層プロファイルは層間のモデルおよびタスク固有のルーティング戦略を示すことを示している。重要なことに、翻訳層と剛性層はすでに行動の一致を改善しているが、アフィン変換はターゲット・プロンプト・タスク・ジオメトリをほぼ回復し、対応する行動の利得を得る最初の層である。このことは、クロス次元線形混合が、指示されたタスク構造に対して表現を再編成する鍵となるメカニズムであることを示唆している。提案フレームワークは,プロンプトによる表現変化を解釈可能な幾何学的成分に分解し,モデルがタスク関連構造を経路してプロンプト駆動行動を生成する方法を明らかにする。

論文の概要: Decomposing how prompting steers behavior

関連論文リスト