Fugu-MT 論文翻訳(概要): Growing with the Generator: Self-paced GRPO for Video Generation

論文の概要: Growing with the Generator: Self-paced GRPO for Video Generation

arxiv url: http://arxiv.org/abs/2511.19356v1
Date: Mon, 24 Nov 2025 17:56:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-25 18:34:25.355156
Title: Growing with the Generator: Self-paced GRPO for Video Generation
Title（参考訳）: ジェネレータで成長するビデオ生成のためのセルフペーストGRPO
Authors: Rui Li, Yuanzhi Liang, Ziqi Ni, Haibing Huang, Chi Zhang, Xuelong Li,
Abstract要約: グループ相対政策最適化は、ビデオ生成後のモデルのための強力な強化学習パラダイムとして登場した。本稿では,報奨フィードバックをジェネレータと共進化させる能力を考慮したGRPOフレームワークであるSelf-Paced GRPOを提案する。生成品質が向上するにつれて、粗い視覚的忠実度から時間的コヒーレンス、微粒なテキスト・ビデオ・セマンティックアライメントに重点を移すプログレッシブ報酬機構を導入する。
参考スコア（独自算出の注目度）: 45.5073437581357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Group Relative Policy Optimization (GRPO) has emerged as a powerful reinforcement learning paradigm for post-training video generation models. However, existing GRPO pipelines rely on static, fixed-capacity reward models whose evaluation behavior is frozen during training. Such rigid rewards introduce distributional bias, saturate quickly as the generator improves, and ultimately limit the stability and effectiveness of reinforcement-based alignment. We propose Self-Paced GRPO, a competence-aware GRPO framework in which reward feedback co-evolves with the generator. Our method introduces a progressive reward mechanism that automatically shifts its emphasis from coarse visual fidelity to temporal coherence and fine-grained text-video semantic alignment as generation quality increases. This self-paced curriculum alleviates reward-policy mismatch, mitigates reward exploitation, and yields more stable optimization. Experiments on VBench across multiple video generation backbones demonstrate consistent improvements in both visual quality and semantic alignment over GRPO baselines with static rewards, validating the effectiveness and generality of Self-Paced GRPO.
Abstract（参考訳）: Group Relative Policy Optimization (GRPO) は、ビデオ生成後モデルのための強力な強化学習パラダイムとして登場した。しかし、既存のGRPOパイプラインは、トレーニング中に評価動作が凍結された静的な固定容量報酬モデルに依存している。このような厳密な報酬は分布バイアスを導入し、発電機が改良するにつれて急速に飽和し、最終的に強化ベースのアライメントの安定性と有効性を制限する。本稿では,報奨フィードバックをジェネレータと共進化させる能力を考慮したGRPOフレームワークであるSelf-Paced GRPOを提案する。生成品質が向上するにつれて、粗い視覚的忠実度から時間的コヒーレンス、微粒なテキスト・ビデオ・セマンティックアライメントに自動的に重点を移すプログレッシブ報酬機構を導入する。このセルフペースのカリキュラムは、報酬と政治のミスマッチを緩和し、報酬の搾取を緩和し、より安定した最適化をもたらす。複数のビデオ生成バックボーンにまたがるVBenchの実験では、静的報酬を伴うGRPOベースラインの視覚的品質とセマンティックアライメントが一貫した改善を示し、Self-Paced GRPOの有効性と汎用性を検証する。

論文の概要: Growing with the Generator: Self-paced GRPO for Video Generation

関連論文リスト