Fugu-MT 論文翻訳(概要): TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

論文の概要: TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

arxiv url: http://arxiv.org/abs/2601.05729v1
Date: Fri, 09 Jan 2026 11:15:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-12 17:41:49.954073
Title: TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment
Title（参考訳）: TAGRPO: 直接軌道アライメントによる画像・映像生成におけるGRPOの強化
Authors: Jin Wang, Jianxiang Lu, Guangzheng Xu, Comi Chen, Haoyu Yang, Linqing Wang, Peng Chen, Mingtao Chen, Zhichao Hu, Longhuang Wu, Shuai Shao, Qinglin Lu, Ping Luo,
Abstract要約: コントラスト学習にインスパイアされたI2Vモデルの堅牢なフレームワークであるTAGRPOを提案する。我々のアプローチは、同一の初期ノイズから生成されたロールアウトビデオが、最適化のための優れたガイダンスを提供するという観察に基づいている。
参考スコア（独自算出の注目度）: 28.18756041538092
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation. However, we find that directly applying these techniques to image-to-video (I2V) models often fails to yield consistent reward improvements. To address this limitation, we present TAGRPO, a robust post-training framework for I2V models inspired by contrastive learning. Our approach is grounded in the observation that rollout videos generated from identical initial noise provide superior guidance for optimization. Leveraging this insight, we propose a novel GRPO loss applied to intermediate latents, encouraging direct alignment with high-reward trajectories while maximizing distance from low-reward counterparts. Furthermore, we introduce a memory bank for rollout videos to enhance diversity and reduce computational overhead. Despite its simplicity, TAGRPO achieves significant improvements over DanceGRPO in I2V generation.
Abstract（参考訳）: 近年,グループ相対政策最適化(GRPO)をフローマッチングモデルに統合することの有効性が実証されている。しかし,画像対ビデオ(I2V)モデルにこれらの手法を直接適用しても,一貫した報酬改善が得られない場合が多い。この制限に対処するために,コントラスト学習に触発されたI2Vモデルのための堅牢な後学習フレームワークTAGRPOを提案する。我々のアプローチは、同一の初期ノイズから生成されたロールアウトビデオが、最適化のための優れたガイダンスを提供するという観察に基づいている。この知見を生かして、中間潜水剤に適用した新しいGRPO損失を提案し、低逆軌道からの距離を最大化しつつ、高逆軌道との直接アライメントを奨励する。さらに,動画配信のためのメモリバンクを導入し,多様性を高め,計算オーバーヘッドを低減する。その単純さにもかかわらず、TAGRPOはI2V世代でDanceGRPOよりも大幅に改善されている。

論文の概要: TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

関連論文リスト