Fugu-MT 論文翻訳(概要): Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization

論文の概要: Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization

arxiv url: http://arxiv.org/abs/2603.21676v1
Date: Mon, 23 Mar 2026 08:06:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.554768
Title: Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
Title（参考訳）: より深く、より長く考える: 合成一般化のための深さリカレント変換器
Authors: Hung-Hsuan Chen,
Abstract要約: 本稿では,パラメータ数から計算深度を分離するDepth-recurrent Transformerを提案する。アーキテクチャには3つのメカニズムが組み込まれています(20以上のステップ)。我々は,タスクの複雑さに対処して,思考ステップがスケールするにつれて,パフォーマンスが機会からほぼ完璧に遷移する,明確な計算フロンティアを観察する。
参考スコア（独自算出の注目度）: 1.5736899098702974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard Transformers have a fixed computational depth, fundamentally limiting their ability to generalize to tasks requiring variable-depth reasoning, such as multi-hop graph traversal or nested logic. We propose a depth-recurrent Transformer that decouples computational depth from parameter count by iteratively applying a shared-weight Transformer block in latent space -- enabling the model to trade recurrence steps for deeper reasoning at inference time. Our architecture incorporates three mechanisms to make deep recurrence (20+ steps) stable: (1) a silent thinking objective that supervises only the final output, forcing genuine multi-step reasoning rather than intermediate heuristic shortcuts; (2) LayerScale initialization to protect fragile reasoning states from untrained layer noise; and (3) an identity-biased recurrence that creates a gradient highway across many steps. We evaluate on three compositional reasoning domains with decreasing inductive biases: graph reachability (strict adjacency masking), nested boolean logic (relative positioning), and unstructured relational text (where sequence position provides no structural hints). Across all tasks, we observe a clear \emph{computational frontier} -- a boundary where performance transitions from chance to near-perfect as thinking steps scale with task complexity. Moreover, these tasks reveal qualitatively different generalization behaviors: precise but brittle (graph), approximate but robust (logic), and autonomous latent routing without structural hints (text). This progression illuminates how the interplay between a task-invariant recurrent reasoning core and task-specific perceptual interfaces shapes out-of-distribution (OOD) generalization, offering a mechanistic perspective on vertical chain-of-thought that complements the prevailing horizontal token-generation paradigm.
Abstract（参考訳）: 標準変換器は固定された計算深度を持ち、基本的にはマルチホップグラフトラバーサルやネスト論理のような可変深度推論を必要とするタスクに一般化する能力を制限している。本稿では,パラメータ数から計算深度を分離し,遅延空間における共有重み付き変圧器ブロックを反復的に適用することにより,モデルが推論時により深い推論を行うために繰り返しステップを交換できる深度再帰変換器を提案する。本アーキテクチャでは,(1)最終出力のみを監督するサイレントな思考目標,(2)未学習層雑音から脆弱な推論状態を保護するためのレイヤスケール初期化,(3)多数のステップにわたる勾配ハイウェイを創出するアイデンティティバイアス再帰の3つのメカニズムを組み込んでいる。帰納的バイアスを減少させる3つの構成的推論領域について評価した。グラフ到達性(限定的隣接マスキング)、ネストされたブール論理(相対的位置決め)、非構造的関係テキスト(シーケンス位置が構造的ヒントを提示しない)である。すべてのタスクにまたがって、明確な 'emph{computational Frontier} -- タスクの複雑さとともにスケールする思考ステップとして、パフォーマンスがチャンスからほぼ完璧に遷移する境界を観察する。さらに、これらのタスクは、正確だが脆い(グラフ)、近似的だが頑健な(論理)、構造的ヒント(テキスト)のない自律潜伏ルーティング(英語版)といった質的に異なる一般化挙動を示す。この進歩は、タスク不変リカレント推論コアとタスク固有パーセプチュアルインターフェースの間の相互作用が、分散(OOD)の一般化をいかに形成するかを照らし、一般的な水平トークン生成パラダイムを補完する垂直チェーン・オブ・シントに関する力学的な視点を提供する。

論文の概要: Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization

関連論文リスト