Fugu-MT 論文翻訳(概要): Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

論文の概要: Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

arxiv url: http://arxiv.org/abs/2605.11726v2
Date: Wed, 13 May 2026 15:38:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 17:13:58.893113
Title: Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models
Title（参考訳）: Block-R1:拡散大言語モデルのためのマルチドメイン強化学習におけるブロックサイズの役割を再考する
Authors: Yan Jiang, Ruihong Qiu, Zi Huang,
Abstract要約: ブロックサイズはdLLMにおいて重要な要素となっている。本稿では,マルチドメインシナリオにおけるDLLM RLポストトレーニングにおけるドメインコンフリクトの観点からのブロックサイズについて検討する。
参考スコア（独自算出の注目度）: 41.859993506122194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, reinforcement learning (RL) has been widely applied during post-training for diffusion large language models (dLLMs) to enhance reasoning with block-wise semi-autoregressive generation. Block size has therefore become a vital factor in dLLMs, since it determines the parallel decoding granularity and affects the rollout trajectories during RL optimisation, e.g., GRPO. Instead of investigating the effect of block size during inference on individual domains, this paper studies block size from a domain conflict perspective for dLLM RL post-training in multi-domain scenarios. The main contributions are: (1) a formulation of domain block size conflict in multi-domain RL for dLLMs, which will largely affect the post-training effectiveness for rollout-based RL methods; (2) a novel dataset, Block-R1-41K is constructed with a best-improved training block size for each sample, which also induces a Block Size Conflict Score to quantitatively measure the domain conflict; (3) a new benchmark, Block-R1, for flexible RL post-training for dLLMs in both single and cross domain; and (4) a simple yet powerful cross-domain post-training method with sample-level best-improved training block sizes. Extensive experiments on 13 distinct datasets, 7 latest RL algorithms and diverse dLLM backbones are comprehensively covered in Block-R1. The benchmark is open-sourced at https://github.com/YanJiangJerry/Block-R1 with the dataset released at https://huggingface.co/datasets/YanJiangJerry/Block-R1-41K.
Abstract（参考訳）: 近年,拡散大言語モデル (dLLMs) のポストトレーニング中に強化学習 (RL) が広く適用され,ブロックワイド半自己回帰生成による推論が向上している。したがってブロックサイズは、並列デコード粒度を決定し、RL最適化中のロールアウト軌跡、例えばGRPOに影響を与えるため、dLLMsにおいて重要な要素となっている。本稿では,各ドメインに対する推論におけるブロックサイズの影響を調べる代わりに,マルチドメインシナリオにおけるDLLM RLポストトレーニングにおけるドメイン競合の観点からブロックサイズを考察する。主な貢献は、(1)dLLMのマルチドメインRLにおけるドメインブロックサイズ競合の定式化、(2)ロールアウトベースのRLメソッドのポストトレーニング効果に大きく影響する、(2)新しいデータセットであるBlock-R1-41Kは、各サンプルに対して最も改善されたトレーニングブロックサイズで構築され、ドメインの衝突を定量的に測定するBlock Size Conflict Scoreを誘導する、(3)シングルドメインとクロスドメインの両方でdLLMの柔軟なRLポストトレーニングのためのBlock-R1、(4)サンプルレベルで最も改善されたトレーニングブロックサイズを持つ単純なクロスドメインポストトレーニング方法である。 13の異なるデータセット、最新のRLアルゴリズム7、多様なdLLMバックボーンに関する大規模な実験は、Block-R1で包括的にカバーされている。ベンチマークはhttps://github.com/YanJiangJerry/Block-R1でオープンソース化され、データセットはhttps://huggingface.co/datasets/YanJiangJerry/Block-R1-41Kでリリースされた。

論文の概要: Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

関連論文リスト