Fugu-MT 論文翻訳(概要): Selective Induction Heads: How Transformers Select Causal Structures In Context

論文の概要: Selective Induction Heads: How Transformers Select Causal Structures In Context

arxiv url: http://arxiv.org/abs/2509.08184v1
Date: Tue, 09 Sep 2025 23:13:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.265755
Title: Selective Induction Heads: How Transformers Select Causal Structures In Context
Title（参考訳）: 選択的誘導ヘッド:コンテクストにおける変圧器の因果構造の選択方法
Authors: Francesco D'Angelo, Francesco Croce, Nicolas Flammarion,
Abstract要約: 因果構造を扱うトランスフォーマーの能力を示す新しいフレームワークを提案する。我々のフレームワークは、遷移確率を固定しつつ、ラグの異なるマルコフ鎖をインターリーブすることで因果構造を変化させる。この設定は、コンテクスト内で正しい因果構造を選択できる新しい回路である選択誘導ヘッド(Selective induction Heads)を形成する。
参考スコア（独自算出の注目度）: 50.09964990342878
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have exhibited exceptional capabilities in sequence modeling tasks, leveraging self-attention and in-context learning. Critical to this success are induction heads, attention circuits that enable copying tokens based on their previous occurrences. In this work, we introduce a novel framework that showcases transformers' ability to dynamically handle causal structures. Existing works rely on Markov Chains to study the formation of induction heads, revealing how transformers capture causal dependencies and learn transition probabilities in-context. However, they rely on a fixed causal structure that fails to capture the complexity of natural languages, where the relationship between tokens dynamically changes with context. To this end, our framework varies the causal structure through interleaved Markov chains with different lags while keeping the transition probabilities fixed. This setting unveils the formation of Selective Induction Heads, a new circuit that endows transformers with the ability to select the correct causal structure in-context. We empirically demonstrate that transformers learn this mechanism to predict the next token by identifying the correct lag and copying the corresponding token from the past. We provide a detailed construction of a 3-layer transformer to implement the selective induction head, and a theoretical analysis proving that this mechanism asymptotically converges to the maximum likelihood solution. Our findings advance the understanding of how transformers select causal structures, providing new insights into their functioning and interpretability.
Abstract（参考訳）: トランスフォーマーは、自己意図とコンテキスト内学習を活用したシーケンスモデリングタスクにおいて、例外的な能力を発揮してきた。この成功に欠かせないのは誘導ヘッド、過去の出来事に基づいてトークンをコピーできるアテンション回路である。本研究では,トランスフォーマーの因果構造を動的に扱う能力を示す新しいフレームワークを提案する。既存の研究は、誘導ヘッドの形成を研究するためにマルコフ・チェインに依存しており、トランスフォーマーが因果依存性を捉え、コンテキスト内で遷移確率を学習する方法を明らかにしている。しかし、それらは自然言語の複雑さを捉えるのに失敗する固定因果構造に依存しており、トークン間の関係は文脈によって動的に変化する。この目的のために、我々のフレームワークは、遷移確率を固定しつつ、異なるラグを持つマルコフ鎖をインターリーブすることで因果構造を変化させる。この設定は、コンテクスト内で正しい因果構造を選択できる新しい回路である選択誘導ヘッド(Selective induction Heads)を形成する。我々は、変換器が次のトークンを予測するために、正しいラグを識別し、それに対応するトークンを過去からコピーすることで、このメカニズムを実証的に示す。本稿では、選択的誘導ヘッドを実装するための3層トランスの詳細な構成と、この機構が漸近的に最大極大解に収束することを示す理論的解析について述べる。本研究は,トランスフォーマーが因果構造をどのように選択するかの理解を深め,その機能と解釈可能性に関する新たな知見を提供する。

論文の概要: Selective Induction Heads: How Transformers Select Causal Structures In Context

関連論文リスト