Fugu-MT 論文翻訳(概要): Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

論文の概要: Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

arxiv url: http://arxiv.org/abs/2510.14961v1
Date: Thu, 16 Oct 2025 17:59:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.99738
Title: Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models
Title（参考訳）: 繰り返し深度モデルに対する効率的な並列サンプリングと拡散言語モデルとの関係
Authors: Jonas Geiping, Xinyu Yang, Guinan Su,
Abstract要約: 繰り返し深度を持つ言語モデルは、層を繰り返して計算量を増やす能力によって定義される。プレトレーニングにおける最近の取り組みは、これらのアーキテクチャが現代の言語モデリングタスクにスケールできることを実証している。我々はこれらのモデルのための新しい拡散強制サンプリング器を開発し、生成を加速する。
参考スコア（独自算出の注目度）: 42.52335470079319
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models with recurrent depth, also referred to as universal or looped when considering transformers, are defined by the capacity to increase their computation through the repetition of layers. Recent efforts in pretraining have demonstrated that these architectures can scale to modern language modeling tasks while exhibiting advantages in reasoning tasks. In this work, we examine the relationship between recurrent-depth models and diffusion language models. Building on their similarities, we develop a new diffusion forcing sampler for these models to accelerate generation. The sampler advances by decoding new tokens at every forward pass of the model, while the latent states of these tokens can be further refined in parallel through recurrence. Theoretically, generation with our sampler is strictly more expressive than the baseline autoregressive generation using the same time budget on modern hardware. Moreover, this sampler, based on principles from diffusion literature, can be directly applied to existing 3.5B recurrent-depth transformers without any tuning, leading to up to a 5x speedup. Consequently, our findings not only provide an efficient mechanism for parallelizing the extra computation in recurrent-depth models at inference, but also suggest that such models can be naturally viewed as strong continuous, though causal, diffusion language models.
Abstract（参考訳）: 再帰的な深度を持つ言語モデルは、トランスフォーマーを考える際に普遍的またはループ化(universal or looped)と呼ばれ、層を繰り返して計算量を増やす能力によって定義される。プレトレーニングにおける最近の取り組みは、これらのアーキテクチャが推論タスクの利点を示しながら、現代の言語モデリングタスクにスケールできることを実証している。本研究では,繰り返し深度モデルと拡散言語モデルとの関係について検討する。それらの類似性に基づいて,これらのモデルに対する新しい拡散強制サンプリング器を開発し,生成を加速する。サンプルはモデルのすべての前方通過で新しいトークンを復号化することで進行するが、これらのトークンの潜在状態は反復的にさらに洗練される。理論的には、現代のハードウェアで同じ時間予算を使ったベースラインの自己回帰生成よりも、サンプリング器による生成の方が厳密に表現力が高い。さらに、拡散文学の原理に基づくこのサンプルは、チューニングなしで既存の3.5Bリカレント深さ変換器に直接適用することができ、最大5倍のスピードアップとなる。その結果,再帰的深度モデルにおける余分な計算を推論時に並列化するための効率的な機構を提供するだけでなく,因果拡散言語モデルであっても,そのようなモデルは自然に強い連続性と見なせることが示唆された。

論文の概要: Efficient Parallel Samplers for Recurrent-Depth Models and Their Connection to Diffusion Language Models

関連論文リスト