Fugu-MT 論文翻訳(概要): SuperFlow: Training Flow Matching Models with RL on the Fly

論文の概要: SuperFlow: Training Flow Matching Models with RL on the Fly

arxiv url: http://arxiv.org/abs/2512.17951v1
Date: Wed, 17 Dec 2025 02:44:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-23 18:54:32.116355
Title: SuperFlow: Training Flow Matching Models with RL on the Fly
Title（参考訳）: SuperFlow: 飛行中のRLによるフローマッチングモデルのトレーニング
Authors: Kaijie Chen, Zhiyang Xu, Ying Shen, Zihao Lin, Yuguang Yao, Lifu Huang,
Abstract要約: SuperFlowはフローベースモデルのためのRLトレーニングフレームワークで、分散対応サンプリングでグループサイズを調整する。当初のトレーニングステップの5.4%から56.3%しか使用せず、有望なパフォーマンスを達成した。アーキテクチャの変更なしにトレーニング時間を5.2%から16.7%に短縮する。
参考スコア（独自算出の注目度）: 40.46209466164144
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in flow-based generative models and reinforcement learning (RL) has improved text-image alignment and visual quality. However, current RL training for flow models still has two main problems: (i) GRPO-style fixed per-prompt group sizes ignore variation in sampling importance across prompts, which leads to inefficient sampling and slower training; and (ii) trajectory-level advantages are reused as per-step estimates, which biases credit assignment along the flow. We propose SuperFlow, an RL training framework for flow-based models that adjusts group sizes with variance-aware sampling and computes step-level advantages in a way that is consistent with continuous-time flow dynamics. Empirically, SuperFlow reaches promising performance while using only 5.4% to 56.3% of the original training steps and reduces training time by 5.2% to 16.7% without any architectural changes. On standard text-to-image (T2I) tasks, including text rendering, compositional image generation, and human preference alignment, SuperFlow improves over SD3.5-M by 4.6% to 47.2%, and over Flow-GRPO by 1.7% to 16.0%.
Abstract（参考訳）: フローベース生成モデルと強化学習(RL)の最近の進歩は、テキスト画像のアライメントと視覚的品質を改善した。しかし、フローモデルに対する現在のRLトレーニングには2つの大きな問題がある。 (i)GRPO型固定型グループサイズは、プロンプト間のサンプリング重要度の変化を無視し、非効率なサンプリングと遅いトレーニングにつながる。 (II)軌道レベルの利点はステップごとの見積もりとして再利用され、フローに沿ったクレジット割り当てに偏りが生じる。フローベースモデルのためのRLトレーニングフレームワークであるSuperFlowを提案する。これは分散型サンプリングでグループサイズを調整し、連続時間フローのダイナミックスと整合した方法でステップレベルのアドバンテージを計算する。経験的に、SuperFlowは、当初のトレーニングステップの5.4%から56.3%しか使用せず、アーキテクチャの変更なしにトレーニング時間を5.2%から16.7%削減している。テキストレンダリング、合成画像生成、人間の嗜好アライメントを含むT2Iタスクでは、SuperFlowはSD3.5-Mを4.6%から47.2%改善し、Flow-GRPOを1.7%から16.0%上回る。

論文の概要: SuperFlow: Training Flow Matching Models with RL on the Fly

関連論文リスト