Fugu-MT 論文翻訳(概要): Diffusion Language Models: An Experimental Analysis

論文の概要: Diffusion Language Models: An Experimental Analysis

arxiv url: http://arxiv.org/abs/2606.19475v2
Date: Mon, 22 Jun 2026 16:35:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 16:10:14.802303
Title: Diffusion Language Models: An Experimental Analysis
Title（参考訳）: 拡散言語モデル:実験分析
Authors: Thomas Bertolani, Davide Bucciarelli, Leonardo Zini, Marcella Cornia, Lorenzo Baraldi,
Abstract要約: 拡散言語モデル(Diffusion Language Models)は、次から次へと予測するよりも反復的な記述を通じてテキストを生成する。我々は、推論、コーディング、翻訳、知識、構造化問題解決にまたがる8つのベンチマークで、最先端の8つのDLMを評価した。 DLMの挙動は世代毎の設計選択の影響を強く受けており,性能と計算効率のトレードオフが顕著であることを示す。
参考スコア（独自算出の注目度）: 21.773483524326362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have revolutionized language modeling through autoregressive generation, enabling strong performance across a wide range of tasks. Recently, Diffusion Language Models (DLMs) have emerged as an alternative paradigm that generates text through iterative denoising rather than next-token prediction, allowing parallel refinement of entire sequences. While numerous diffusion-based architectures have been proposed, differences in evaluation protocols, datasets, inference budgets, and generation hyperparameters make it difficult to compare their capabilities and understand the trade-offs they offer. In this work, we present a systematic experimental analysis of modern DLMs. Specifically, we evaluate eight state-of-the-art DLMs across eight benchmarks spanning reasoning, coding, translation, knowledge, and structured problem solving, while explicitly considering both generation quality and computational efficiency. Beyond downstream evaluation, we analyze the impact of key inference-time factors, including denoising steps, context length, block size, and parallel unmasking strategies, and complement large-scale experiments with controlled comparisons of smaller models trained under identical conditions. Our analysis highlights the strengths and limitations of diffusion-based language modeling across different tasks, architectures, and inference budgets. We show that the behavior of DLMs is strongly influenced by generation-time design choices, leading to distinct trade-offs between performance and computational efficiency. Overall, our study provides practical insights into the capabilities and deployment characteristics of contemporary DLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、自動回帰生成を通じて言語モデリングに革命をもたらし、幅広いタスクにわたって強力なパフォーマンスを実現している。近年, 拡散言語モデル (DLM) は, 逐次予測よりも反復的推論によってテキストを生成する代替パラダイムとして登場し, シーケンス全体の並列化を実現している。多くの拡散ベースのアーキテクチャが提案されているが、評価プロトコル、データセット、推論予算、生成ハイパーパラメータの違いは、それらの能力を比較し、それらが提供するトレードオフを理解するのを困難にしている。本研究では,現代のDLMの系統的解析について述べる。具体的には、推論、コーディング、翻訳、知識、構造化問題解決にまたがる8つのベンチマークの8つの最先端DLMを評価し、生成品質と計算効率の両方を明示的に検討した。ダウンストリーム評価以外にも,ステップ,コンテキスト長,ブロックサイズ,並列アンマスキング戦略などの重要推論時間要因の影響を分析し,同一条件下で訓練された小型モデルの制御比較による大規模実験を補完する。我々の分析は、様々なタスク、アーキテクチャ、推論予算にまたがる拡散に基づく言語モデリングの長所と短所を強調している。 DLMの挙動は世代毎の設計選択の影響を強く受けており,性能と計算効率のトレードオフが顕著であることを示す。本研究は,現代のDLMの能力と展開特性に関する実践的な知見を提供する。

論文の概要: Diffusion Language Models: An Experimental Analysis

関連論文リスト