Fugu-MT 論文翻訳(概要): Solve the Loop: Attractor Models for Language and Reasoning

論文の概要: Solve the Loop: Attractor Models for Language and Reasoning

arxiv url: http://arxiv.org/abs/2605.12466v1
Date: Tue, 12 May 2026 17:51:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:57.069196
Title: Solve the Loop: Attractor Models for Language and Reasoning
Title（参考訳）: ループの解決:言語と推論のためのトラクターモデル
Authors: Jacob Fein-Ashley, Paria Rashidinejad,
Abstract要約: Looped Transformerは、純粋にフィードフォワード計算に代わる有望な代替手段を提供する。本稿では、まず、バックボーンモジュールが出力の埋め込みを提案し、次にアトラクターモジュールが固定点を解くことでそれらを洗練するAttractor Modelsを紹介する。本研究では,Attractor Modelsが,大規模言語モデル事前学習と推論という2つのレシエーションにおいて,既存モデルよりも優れていることを示す。
参考スコア（独自算出の注目度）: 4.8720589853137435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We introduce Attractor Models, in which a backbone module first proposes output embeddings, then an attractor module refines them by solving for the fixed point, with gradients obtained through implicit differentiation. Thus, training memory remains constant in effective depth, and iterations are chosen adaptively by convergence. Empirically, Attractor Models outperform existing models across two regimes, large-scale language-model pretraining and reasoning with tiny models. In language modeling, Attractor Models deliver a Pareto improvement over standard Transformers and stable looped models across sizes, improving perplexity by up to 46.6% and downstream accuracy by up to 19.7% while reducing training cost. Notably, a 770M Attractor Model outperforms a 1.3B Transformer trained on twice as many tokens. On challenging reasoning tasks, we show that our model with only 27M parameters and approximately 1000 examples achieves 91.4% accuracy on Sudoku-Extreme and 93.1% on Maze-Hard, scaling favorably where frontier models like Claude and GPT o3, fail completely, and specialized recursive reasoners collapse at larger sizes. Lastly, we show that Attractor Models exhibit a novel phenomenon, which we call equilibrium internalization: fixed-point training places the model's initial output embedding near equilibrium, allowing the solver to be removed at inference time with little degradation. Together, these results suggest that Attractor Models make iterative refinement scalable by turning recurrence into a computation the model can learn to internalize.
Abstract（参考訳）: Looped Transformerは、遅延表現を反復的に精製し、言語モデリングと推論を改善することによって、純粋にフィードフォワード計算に代わる有望な代替手段を提供する。しかし、リカレントアーキテクチャはトレーニングが不安定で、最適化とデプロイにコストがかかり、小さな、固定されたリカレンス深さに制約される。本稿では、まず、バックボーンモジュールが出力埋め込みを提案し、次にアトラクタモジュールが固定点の解法によってそれらを洗練し、その勾配は暗黙の微分によって得られる。このように、トレーニングメモリは効果的な深さで一定であり、反復は収束によって適応的に選択される。経験的に、Attractor Modelsは2つのレシエーション、大規模言語モデルの事前訓練、そして小さなモデルによる推論において、既存のモデルよりも優れています。言語モデリングでは、Attractor Modelsは標準のトランスフォーマーと安定したループモデルよりもパレートを改善し、パープレキシティを最大46.6%改善し、下流の精度を最大19.7%改善し、トレーニングコストを削減した。 770Mのトラクターモデルでは、トークン数の2倍で訓練された1.3Bトランスフォーマーよりも優れていた。挑戦的推論タスクでは,2700万のパラメータと約1000のサンプルしか持たないモデルでは,Sudoku-Extremeで91.4%,Maze-Hardで93.1%,ClaudeやGPT o3といったフロンティアモデルが完全に故障した場合にはスケールが良好で,特別な再帰的推論器がより大きなサイズで崩壊することを示す。最後に、アトラクタモデルが平衡内部化と呼ばれる新しい現象を示すことを示す: 固定点トレーニングはモデルの初期出力を平衡付近に埋め込む。これらの結果は,モデルが内部化を学べる計算に再帰性を変換することで,反復的洗練をスケーラブルにすることを示唆している。

論文の概要: Solve the Loop: Attractor Models for Language and Reasoning

関連論文リスト