Fugu-MT 論文翻訳(概要): Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning

論文の概要: Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning

arxiv url: http://arxiv.org/abs/2605.04061v1
Date: Fri, 10 Apr 2026 14:49:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 06:56:26.55097
Title: Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning
Title（参考訳）: 単一Positionインターベンション障害:分散出力テンプレートがコンテキスト内学習を促進する
Authors: Bryan Cheng, Jasper Zhang,
Abstract要約: 大規模な言語モデルがどのようにタスクのアイデンティティを数発のデモからエンコードしているかを理解することは、機械的解釈可能性において中心的なオープンな問題である。以前の作業では、リニアプローブを使用してタスク表現をローカライズし、特定のレイヤで高い分類精度を報告していた。正確さを求めることは因果的重要性を予測するのに完全に失敗する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how large language models encode task identity from few-shot demonstrations is a central open problem in mechanistic interpretability. Prior work uses linear probing to localize task representations, reporting high classification accuracy at specific layers. We reveal a striking dissociation: probing accuracy completely fails to predict causal importance. Single-position activation intervention achieves 0% task transfer across all 28 layers of Llama-3.2-3B-despite 100% probing accuracy at those same positions. This null result is itself a key finding, demonstrating that task encoding is fundamentally distributed. Multi-position intervention-replacing activations at all demonstration output tokens simultaneously-achieves up to 96% transfer (N=50, 95% CI: [87%, 99%]) at layer 8, pinpointing for the first time the causal locus of ICL task identity. We establish the generality of these findings across four models spanning three architecture families (LLaMA, Qwen, Gemma), discovering a universal intervention window at ~30% network depth. Causal tracing uncovers an asymmetric architecture: the query position is strictly necessary (53-100% disruption) while no individual demonstration position is necessary (0% disruption)-resolving a key ambiguity in prior accounts. Crucially, transfer depends on internal representation compatibility, not surface similarity (r=-0.05 vs r=0.31), ruling out trivial explanations. These results establish the distributed template hypothesis: ICL task identity is encoded as output format templates distributed across demonstration tokens, fundamentally reshaping our understanding of how in-context learning operates.
Abstract（参考訳）: 大規模な言語モデルがどのようにタスクのアイデンティティを数発のデモからエンコードしているかを理解することは、機械的解釈可能性において中心的なオープンな問題である。以前の作業では、リニアプローブを使用してタスク表現をローカライズし、特定のレイヤで高い分類精度を報告していた。正確さを求めることは因果的重要性を予測するのに完全に失敗する。単一位置の活性化介入はLlama-3.2-3Bの28層すべてに0%のタスク転送を達成する。このnull結果はそれ自体も重要な発見であり、タスクのエンコーディングが基本的に分散していることを示しています。 ICLタスクアイデンティティの因果軌跡が最初に指摘されたのは、すべてのデモ出力トークンにおけるマルチポジション介入-リプレースアクティベーションの活性化であり、同時に96%の転送(N=50,95% CI: [87%, 99%])をレイヤ8で達成する。 3つのアーキテクチャファミリ(LLaMA, Qwen, Gemma)にまたがる4つのモデルにまたがるこれらの発見の一般性を確立する。因果トレースは非対称なアーキテクチャを明らかにする:クエリ位置は厳密に(53-100%の破壊)必要であるが、個々のデモ位置は必要ない(0%の破壊)。重要なことに、転送は内部表現の互換性に依存し、表面的類似性(r=-0.05 対 r=0.31)ではなく、自明な説明を除外する。 ICLタスクアイデンティティは、デモトークンに分散した出力フォーマットテンプレートとしてエンコードされ、コンテキスト内学習の動作に関する理解を根本的に再構築する。

論文の概要: Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning

関連論文リスト