Fugu-MT 論文翻訳(概要): Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

論文の概要: Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

arxiv url: http://arxiv.org/abs/2508.08222v1
Date: Mon, 11 Aug 2025 17:40:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:29.243372
Title: Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
Title（参考訳）: 多頭部変圧器は多段階共振をグラディエント・ディフレッシュにより学習する
Authors: Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi,
Abstract要約: この研究は、トランスフォーマーがシンボリックな多段階推論問題をチェーン・オブ・ソート・プロセスを通してどのように解くかを研究する。モデルがゴールノードからルートへの経路を出力する後方推論タスクと,より複雑な前方推論タスクである。訓練された一層変圧器は、木々の一般化を保証することにより、両方の課題を確実に解決できることを示す。
参考スコア（独自算出の注目度）: 66.78052387054593
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have demonstrated remarkable capabilities in multi-step reasoning tasks. However, understandings of the underlying mechanisms by which they acquire these abilities through training remain limited, particularly from a theoretical standpoint. This work investigates how transformers learn to solve symbolic multi-step reasoning problems through chain-of-thought processes, focusing on path-finding in trees. We analyze two intertwined tasks: a backward reasoning task, where the model outputs a path from a goal node to the root, and a more complex forward reasoning task, where the model implements two-stage reasoning by first identifying the goal-to-root path and then reversing it to produce the root-to-goal path. Our theoretical analysis, grounded in the dynamics of gradient descent, shows that trained one-layer transformers can provably solve both tasks with generalization guarantees to unseen trees. In particular, our multi-phase training dynamics for forward reasoning elucidate how different attention heads learn to specialize and coordinate autonomously to solve the two subtasks in a single autoregressive path. These results provide a mechanistic explanation of how trained transformers can implement sequential algorithmic procedures. Moreover, they offer insights into the emergence of reasoning abilities, suggesting that when tasks are structured to take intermediate chain-of-thought steps, even shallow multi-head transformers can effectively solve problems that would otherwise require deeper architectures.
Abstract（参考訳）: トランスフォーマーは多段階推論タスクにおいて顕著な能力を示した。しかし、これらの能力を得るメカニズムの理解は、特に理論的な観点からは限定的のままである。本研究は, 木における経路探索に着目し, チェーン・オブ・ソートプロセスを通じて, シンボル的多段階推論問題の解法を学習する方法について検討する。モデルがゴールノードからルートへの経路を出力する後方推論タスクと、より複雑な前方推論タスクと、最初にゴール・ツー・ルートの経路を識別し、それを逆転してルート・ツー・ゴールの経路を生成する2段階推論タスクである。我々の理論的解析は勾配降下の力学に基づいており、訓練された一層変圧器は、目に見えない木への一般化を保証することで、両方の課題を確実に解決できることを示している。特に,2つのサブタスクを1つの自己回帰経路で解くために,異なるアテンションヘッドが自律的に専門化・コーディネートする方法を,前方推論のための多相トレーニングダイナミクスにより解明する。これらの結果は、トレーニングされたトランスフォーマーがシーケンシャルなアルゴリズムの手順をどのように実装できるかを機械論的に説明する。さらに、彼らは推論能力の出現に関する洞察を提供し、タスクが中間的なチェーンのステップを取るように構造化されている場合、浅いマルチヘッドトランスフォーマーでさえ、より深いアーキテクチャを必要とする問題を効果的に解決できることを示唆している。

論文の概要: Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

関連論文リスト