Fugu-MT 論文翻訳(概要): Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

論文の概要: Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

arxiv url: http://arxiv.org/abs/2606.24369v1
Date: Tue, 23 Jun 2026 09:59:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.895378
Title: Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation
Title（参考訳）: 拡散型並列性とトレーナー支援による視覚発生型LLM用解離RLの高速化
Authors: Sijie Wang, Zhengyu Qing, Zhiqiang Tan, Yiming Yin, Yeqing Zhang, Yaoyuan Wang, Qiang Wang, Xiaowen Chu, Shaohuai Shi,
Abstract要約: DigenRLは拡散型大規模言語モデル(LLM)のフレームワークである柔軟なリソース割り当てをサポートし、異種GPUに対応し、効率的なタスクスケジューリングを容易にする。 DigenRLは、最先端拡散RLシステムよりも1.56-2.10倍のスループット向上を実現している。
参考スコア（独自算出の注目度）: 26.08473785297375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has become a dominant post-training paradigm, driving the emergence of high-performance RL systems such as veRL for autoregressive large language models (LLMs). In parallel, diffusion-oriented RL algorithms, e.g., DanceGRPO and FlowGRPO, have rapidly expanded the scope of RL from language reasoning to diffusion-based visual and flow-based generation. However, efficient RL systems for diffusion generative LLMs remain underexplored. Existing implementations, e.g., veRL-Omni, still rely on colocated execution, which simplifies synchronization but couples rollout and training resources, limits heterogeneous deployment, and constrains independent scaling. To this end, we introduce DigenRL, a disaggregated RL framework for diffusion-based generative LLMs that supports flexible resource allocation, accommodates heterogeneous GPUs, and facilitates efficient task scheduling. To maximally reduce the execution bubbles in the disaggregated architecture, we propose: 1) a generation-axis pipeline (GAP) and time-step parallelism (TSP) in the diffusion architecture to enable finer-grained pipelining between rollout and training; 2) an elastic trainer-assisted generation (TAG) approach to enable the trainer GPU resources to dynamically assist in executing rollout generations; and 3) a tightly one-step constrained asynchronous strategy to further utilize the tail bubble in the pipeline. Extensive experiments are conducted on three hardware testbeds with 16-32 GPUs using HunyuanVideo-13B, Wan2.1-14B, FLUX.1-12B, and QwenImage-20B generative models. Experimental results show that DigenRL achieves 1.56-2.10x throughput improvements over state-of-the-art diffusion RL systems, veRL-Omni and GenRL.
Abstract（参考訳）: 強化学習(Reinforcement Learning, RL)は, 自己回帰型大規模言語モデル(LLM)のためのveRLのような高性能なRLシステムの出現を後押しする, ポストトレーニングのパラダイムとして主流となっている。並列的に,拡散指向型RLアルゴリズムであるDanceGRPOとFlowGRPOは,言語推論から拡散型ビジュアルおよびフローベース生成まで,RLの範囲を急速に拡大している。しかし, 拡散生成性LLMの効率的なRLシステムはいまだ未検討である。既存の実装であるveRL-Omniは、同期を簡略化するが、ロールアウトとトレーニングリソース、異種デプロイメントの制限、独立スケーリングの制約など、コロケーション実行に依存している。この目的のために、拡散型LLMのための分散RLフレームワークであるDigenRLを紹介し、柔軟性のあるリソース割り当てをサポートし、異種GPUに対応し、効率的なタスクスケジューリングを容易にする。分散アーキテクチャにおける実行バブルを最大化するために,本稿では,次のように提案する。 1 拡散アーキテクチャにおける世代軸パイプライン(GAP)及び時間ステップ並列化(TSP)により、ロールアウトとトレーニングの間によりきめ細かなパイプライニングを可能にする。 2) エラスティックトレーナー支援ジェネレーション(TAG)アプローチにより、トレーナーGPUリソースがロールアウトジェネレーションの実行を動的に支援できる。 3) パイプラインのテールバブルをさらに活用するための,厳密な1段階の制約付き非同期戦略。 HunyuanVideo-13B、Wan2.1-14B、FLUX.1-12B、QwenImage-20B生成モデルを使用して、16-32GPUを搭載した3つのハードウェアテストベッドで大規模な実験が行われた。実験結果から,DigenRLは最先端拡散RLシステム,veRL-Omni,GenRLよりも1.56-2.10倍のスループット向上を実現していることがわかった。

論文の概要: Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

関連論文リスト