Fugu-MT 論文翻訳(概要): Advances in GRPO for Generation Models: A Survey

論文の概要: Advances in GRPO for Generation Models: A Survey

arxiv url: http://arxiv.org/abs/2603.06623v1
Date: Sat, 21 Feb 2026 17:11:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-15 16:38:22.448454
Title: Advances in GRPO for Generation Models: A Survey
Title（参考訳）: 世代モデルのためのGRPOの進歩:サーベイ
Authors: Zexiang Liu, Xianglong He, Yangguang Li,
Abstract要約: Flow-GRPOは、生成モデルのための強化学習フレームワークである。生成モデルと人間の好みとタスク固有の目的を一致させるのに使うことができる。この調査では、Flow-GRPOを現代的な生成モデルのための一般的なアライメントフレームワークとして取り上げている。
参考スコア（独自算出の注目度）: 5.995432816974204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale flow matching models have achieved strong performance across generative tasks such as text-to-image, video, 3D, and speech synthesis. However, aligning their outputs with human preferences and task-specific objectives remains challenging. Flow-GRPO extends Group Relative Policy Optimization (GRPO) to generation models, enabling stable reinforcement learning alignment for generative systems. Since its introduction, Flow-GRPO has triggered rapid research growth, spanning methodological refinements and diverse application domains. This survey provides a comprehensive review of Flow-GRPO and its subsequent developments. We organize existing work along two primary dimensions. First, we analyze methodological advances beyond the original framework, including reward signal design, credit assignment, sampling efficiency, diversity preservation, reward hacking mitigation, and reward model construction. Second, we examine extensions of GRPO-based alignment across generative paradigms and modalities, including text-to-image, video generation, image editing, speech and audio, 3D modeling, embodied vision-language-action systems, unified multimodal models, autoregressive and masked diffusion models, and restoration tasks. By synthesizing theoretical insights and practical adaptations, this survey highlights Flow-GRPO as a general alignment framework for modern generative models and outlines key open challenges for scalable and robust reinforcement-based generation.
Abstract（参考訳）: 大規模フローマッチングモデルは,テキスト・ツー・イメージ,ビデオ,3D,音声合成などの生成タスクにおいて,高い性能を実現している。しかしながら、アウトプットを人間の好みやタスク固有の目的に合わせることは依然として困難である。 Flow-GRPOは、グループ相対ポリシー最適化(GRPO)を生成モデルに拡張し、生成システムの安定した強化学習アライメントを可能にする。 Flow-GRPOの導入以来、Flow-GRPOは急激な研究の進展を引き起こし、方法論的な洗練と多様な応用ドメインにまたがっている。この調査は、Flow-GRPOとそのその後の開発に関する包括的なレビューを提供する。既存の作業は2つの主要な側面に沿って整理します。まず、報奨信号の設計、クレジット割り当て、サンプリング効率、多様性の保存、報奨ハッキング緩和、報奨モデル構築など、当初の枠組みを超えた方法論的進歩を分析する。第2に,テキスト・ツー・イメージ,ビデオ生成,画像編集,音声・音声,3Dモデリング,具体的視覚言語・アクション・システム,統合マルチモーダル・モデル,自己回帰的・マスク的拡散モデル,復元作業など,GRPOに基づくアライメントの拡張について検討する。理論的な洞察と実践的な適応を合成することにより、フロー-GRPOを現代の生成モデルのための一般的なアライメントフレームワークとして取り上げ、スケーラブルで堅牢な強化モデル生成のための重要な課題を概説する。

論文の概要: Advances in GRPO for Generation Models: A Survey

関連論文リスト