Fugu-MT 論文翻訳(概要): Entropy-Gated Latent Recursion

論文の概要: Entropy-Gated Latent Recursion

arxiv url: http://arxiv.org/abs/2606.16620v1
Date: Mon, 15 Jun 2026 12:14:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.508576
Title: Entropy-Gated Latent Recursion
Title（参考訳）: Entropy-Gated Latent Recursion
Authors: Soham Bhattacharjee, Dushyant Singh Chauhan, Salem Lahlou, Martin Takac, Nils Lukas,
Abstract要約: インタイムスケーリングは言語モデル推論を改善する主要なレバーとなっている。既存の方法は、単一ソースからロールアウトの多様性を導き出す:トークンレベルのサンプリング。この単軸サンプリング空間は基本的に制限されている。 L$軸は真に温度と相補的であることを示す。
参考スコア（独自算出の注目度）: 9.65821666936513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inference-time scaling has become the dominant lever for improving language-model reasoning, but existing methods derive rollout diversity from a single source: stochastic token-level sampling. We argue that this single-axis sampling space is fundamentally limiting, and identify a second, fully deterministic and complementary axis: the layer span $L$ at which a frozen model's top decoder layers are recursively re-applied at high-uncertainty tokens. Different choices of $L$ produce distinct rollouts that solve different subsets of problems, with no stochasticity. We instantiate this axis through Entropy-Gated Latent Recursion (EGLR), a training-free decoding procedure that re-applies the top-$L$ layers for at most $K_{\max}$ iterations until the next-token distribution converges. Combined with $T$ temperature samples, EGLR turns a single-axis stochastic rollout pool into an $L\times T$ Cartesian sampling space at almost the same per-rollout cost. We characterize this space across $8$ instruction-tuned models and $6$ math reasoning benchmarks, and show that the $L$-axis is genuinely complementary to temperature: on MATH-500 with Qwen2.5-3B-Instruct, the joint $L\times T$ oracle reaches $91.6\%$, $+8.2$ percentage points beyond the temperature-only oracle ($83.4\%$) and $+10.4$ points beyond the layer-only oracle ($81.2\%$), confirming that the two axes capture genuinely complementary problems. The expanded rollout pool provides richer per-prompt candidates for any downstream procedure that consumes rollouts, including self-consistency, best-of-$N$ with verifiers, and group-relative RL training (GRPO), opening a new direction for inference-time scaling that does not rely on stochastic noise.
Abstract（参考訳）: 推論時間のスケーリングは、言語モデル推論を改善するための主要なレバーとなっているが、既存の手法では、単一ソースである確率トークンレベルのサンプリングから、ロールアウトの多様性を導き出している。この単軸サンプリング空間は基本的に制限されており、第2の決定論的かつ補完的な軸を同定する:この層は、凍結モデルのトップデコーダ層を高不確実なトークンで再帰的に再適用する、$L$にまたがる。 L$の異なる選択は、確率性なしで異なる問題のサブセットを解く異なるロールアウトを生成する。この軸をEntropy-Gated Latent Recursion (EGLR) でインスタンス化する。これはトレーニング不要の復号処理で、最大$K_{\max}$イテレーションで上位$L$レイヤを適用できる。 T$の温度サンプルと組み合わせて、EGLRは1軸の確率的なロールアウトプールを、ロールアウト当たりのほぼ同じコストで、$L\times T$ Cartesianサンプリングスペースに変える。 MATH-500 with Qwen2.5-3B-Instruct, the joint $L\times T$ oracle reach 91.6\%$, $+8.2$ percentage points beyond the temperature-only oracle$83.4\%$) and $+10.4$ points beyond the layer-only oracle$81.2\%$。拡張されたロールアウトプールは、自己整合性、検証子付きN$の最高値、グループ相対RLトレーニング(GRPO)など、ダウンストリームプロシージャを使用する任意のダウンストリームプロシージャに対して、よりリッチなプロシージャ毎の候補を提供する。

論文の概要: Entropy-Gated Latent Recursion

関連論文リスト