Fugu-MT 論文翻訳(概要): dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

論文の概要: dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

arxiv url: http://arxiv.org/abs/2605.09291v1
Date: Sun, 10 May 2026 03:36:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.171091
Title: dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
Title（参考訳）: dFlowGRPO:離散フローモデルに対するレートアウェアポリシー最適化
Authors: Zhengyan Wan, Yidong Ouyang, Panwen Hu, Qiang Sun,
Abstract要約: 本稿では,離散フローモデルのための統合強化学習フレームワークであるFlow-GRPOを提案する。マルコフ決定過程として DFM と定式化の完全な軌道確率を導出する。本稿では,最近のマルチモーダル離散フローモデルであるFUDOKIにdFlowGRPOを適用し,画像生成タスクとマルチモーダル理解タスクの両方で評価する。
参考スコア（独自算出の注目度）: 8.198964054238731
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to FUDOKI, a recent multimodal discrete flow model, and evaluate it on both image generation and multimodal understanding tasks. Empirical results show that dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks and achieves performance competitive with continuous flow-based models trained using FlowGRPO, while also demonstrating strong capabilities on understanding tasks.
Abstract（参考訳）: 離散フローモデル (DFM) は離散データを生成するフレキシブルな生成モデルの一種であり、拡散大言語モデル (dLLM) は特定の混合経路とマスクされたソース分布を選択する特別な場合とみなすことができる。近年のいくつかの研究は、強化学習をdLLMに適用する研究を行っているが、より一般的な離散フローモデルへの応用はいまだ検討されていない。本研究では、離散フローモデルのための統合強化学習フレームワークである、離散フロー-GRPO(dFlowGRPO)について述べる。我々は,DFMの完全軌跡確率をマルコフ決定過程として定式化することにより,強化学習において,dFlowGRPOが関連する条件遷移率と後部モデルの両方から情報を組み込むことが可能となる。本稿では,最近のマルチモーダル離散フローモデルであるFUDOKIにdFlowGRPOを適用し,画像生成タスクとマルチモーダル理解タスクの両方で評価する。実験結果から,dFlowGRPOは,テキスト・画像生成タスクにおけるdLLMの既存のGRPO方式よりも優れており,FlowGRPOを用いてトレーニングした連続フローベースモデルと競合する性能を実現している。

論文の概要: dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

関連論文リスト