Fugu-MT 論文翻訳(概要): Dynamic Chunking for Diffusion Language Models

論文の概要: Dynamic Chunking for Diffusion Language Models

arxiv url: http://arxiv.org/abs/2605.15676v1
Date: Fri, 15 May 2026 06:56:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:26.201146
Title: Dynamic Chunking for Diffusion Language Models
Title（参考訳）: 拡散言語モデルのための動的チャンキング
Authors: Yichen Zhu, Xiaoming Shi, Peng Zhao, Weiyu Chen, Debing Zhang, James Kwok,
Abstract要約: ブロック離散拡散言語モデルは、固定サイズの位置ブロック上で自己回帰的にシーケンスを分解する。 textbfDynamic textbfChunking textbfDiffusion textbfModel (DCDM)を紹介する。 DCDMは、位置ブロックをコンテンツ定義セマンティックチャンクに置き換える。
参考スコア（独自算出の注目度）: 39.198939178122714
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Block discrete diffusion language models factorize a sequence autoregressively over fixed-size positional blocks, decoupling within-block parallel denoising from across-block conditioning. We argue that this rigid partition wastes structure already present in the sequence: blocks defined by position rather than by content separate semantically coherent tokens and group unrelated ones together. We introduce the \textbf{D}ynamic \textbf{C}hunking \textbf{D}iffusion \textbf{M}odel (DCDM), which replaces positional blocks with content-defined semantic chunks. At its core is Chunking Attention, a differentiable layer that routes tokens into $K$ clusters parameterized by learnable subspaces and shaped end-to-end by the diffusion objective. The resulting cluster assignments induce a chunk-causal attention mask under which a discrete diffusion denoiser factorizes the sequence likelihood autoregressively over semantic chunks, strictly generalizing block discrete diffusion. On downstream benchmarks at parameter scales up to 1.5B, DCDM consistently improves over both unstructured and positional-block diffusion baselines, with the advantage stable across scales and visible early in training.
Abstract（参考訳）: ブロック離散拡散言語モデルは、ブロック内並列化をブロック間条件から切り離して、固定サイズの位置ブロック上で自己回帰的にシーケンスを分解する。この厳密な分割構造は、意味的に一貫性のあるトークンを分離し、無関係なトークンをまとめてグループ化するのではなく、位置によって定義されるブロックというシーケンスに既に存在する構造を無駄にしている、と我々は主張する。我々は、位置ブロックをコンテンツ定義セマンティックチャンクに置き換える、textbf{D}ynamic \textbf{C}hunking \textbf{D}iffusion \textbf{M}odel (DCDM)を導入する。コアとなるChunking Attentionは、トークンを学習可能なサブスペースでパラメータ化された$K$クラスタにルーティングし、拡散目標によってエンドツーエンドを形成する、微分可能なレイヤである。得られたクラスタ割り当ては、離散拡散復号器がセマンティックチャンク上で自己回帰的にシーケンスを分解し、ブロック離散拡散を厳密に一般化するチャンク因果注意マスクを誘導する。パラメータのダウンストリームベンチマークは1.5Bまでスケールするが、DCDMは非構造化と位置ブロックの拡散ベースラインの両方を一貫して改善する。

論文の概要: Dynamic Chunking for Diffusion Language Models

関連論文リスト