Fugu-MT 論文翻訳(概要): The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?

論文の概要: The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?

arxiv url: http://arxiv.org/abs/2510.25791v1
Date: Tue, 28 Oct 2025 20:14:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.487375
Title: The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
Title（参考訳）: 推論の速度論:変圧器のチェーン・オブ・サード・シェイプはどのように学習するか?
Authors: Zihan Pengmei, Costas Mavromatis, Zhengyuan Shen, Yunyi Zhang, Vassilis N. Ioannidis, Huzefa Rangwala,
Abstract要約: チェーン・オブ・シント(CoT)の監督はトランスフォーマーの性能を大幅に向上させることができる。シンボリック推論タスクにおけるトランスフォーマーの事前学習により、これらの学習ダイナミクスをグラクキングのレンズを通して検討する。
参考スコア（独自算出の注目度）: 25.29458951592086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.
Abstract（参考訳）: チェーン・オブ・シント(CoT)の監督はトランスフォーマーのパフォーマンスを大幅に向上させるが、モデルがCoTに追従し利益を得るメカニズムはいまだに理解されていない。本研究では,これらの学習ダイナミクスを,アルゴリズムの複雑度を調整可能なシンボリック推論タスクと制御可能なデータ合成を用いて学習し,その一般化について検討する。モデルは2つの設定で訓練された。 (i)最終回答のみを生成し、 (ii)回答する前に明示的なCoTトレースを出力する。結果から,CoTは一般にタスク性能を向上するが,その利点はタスクの複雑さに依存することがわかった。これらの効果を定量化するために、3パラメータロジスティック曲線を用いて対数学習ステップの精度をモデル化し、学習速度と形状がタスクの複雑さ、データ分布、CoT監督の有無によってどのように変化するかを明らかにする。トレーニングの早い段階で、モデルは、CoTステップをスキップしたり、矛盾させたりしながら、正しい答えを生成し、その後、彼らの推論トレースと回答を一致させます。実験的に、(1)CoTは一般化を加速するが、リストの交叉を見つけるなどのアルゴリズムの複雑さを克服しないこと、(2)トランスフォーマー学習を理解するための運動モデリングフレームワークを導入すること、(3)トレーニング中に現れる動的特性としてトレース忠実性を特徴付けること、(4)CoTが内部トランスフォーマー計算を機械的に変更すること、などが示される。

論文の概要: The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?

関連論文リスト