TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment
- URL: http://arxiv.org/abs/2601.05729v1
- Date: Fri, 09 Jan 2026 11:15:27 GMT
- Title: TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment
- Authors: Jin Wang, Jianxiang Lu, Guangzheng Xu, Comi Chen, Haoyu Yang, Linqing Wang, Peng Chen, Mingtao Chen, Zhichao Hu, Longhuang Wu, Shuai Shao, Qinglin Lu, Ping Luo,
- Abstract summary: We present TAGRPO, a robust framework for I2V models inspired by contrastive learning.<n>Our approach is grounded in the observation that rollout videos generated from identical initial noise provide superior guidance for optimization.
- Score: 28.18756041538092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation. However, we find that directly applying these techniques to image-to-video (I2V) models often fails to yield consistent reward improvements. To address this limitation, we present TAGRPO, a robust post-training framework for I2V models inspired by contrastive learning. Our approach is grounded in the observation that rollout videos generated from identical initial noise provide superior guidance for optimization. Leveraging this insight, we propose a novel GRPO loss applied to intermediate latents, encouraging direct alignment with high-reward trajectories while maximizing distance from low-reward counterparts. Furthermore, we introduce a memory bank for rollout videos to enhance diversity and reduce computational overhead. Despite its simplicity, TAGRPO achieves significant improvements over DanceGRPO in I2V generation.
Related papers
- Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation [19.119239411510936]
We introduce a GT-Pair that builds high-quality preference pairs by using real videos as positives and model-generated videos as negatives.<n>We also present Reg-DPO, which incorporates the SFT loss as a regularization term into the DPO loss to enhance training stability and generation fidelity.
arXiv Detail & Related papers (2025-11-03T11:04:22Z) - Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation [29.015994347609936]
Group Relative Policy Optimization (GRPO) has shown strong potential for flow-matching-based text-to-image (T2I) generation.<n>We argue that shifting the optimization paradigm from the step level to the chunk level can effectively alleviate these issues.<n>Chunk-GRPO is the first chunk-level GRPO-based approach for T2I generation.
arXiv Detail & Related papers (2025-10-24T15:50:36Z) - Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning [34.75717081153747]
Current methods for scoring generated images are susceptible to reward hacking.<n>We propose Pref-GRPO, which shifts the optimization objective from score to preference fitting, ensuring more stable training.<n>Existing T2I benchmarks are limited by coarse evaluation criteria, hindering comprehensive model assessment.<n>We introduce UniGenBench, a unified T2I benchmark comprising 600 prompts across 5 main themes and 20 subthemes.
arXiv Detail & Related papers (2025-08-28T13:11:24Z) - DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO [37.07375927420007]
Group Relative Policy Optimization has shown impressive success using a PPO-style reinforcement algorithm with group-normalized rewards.<n>In this paper, we explore GRPO and identify two problems that deteriorate the effective learning.<n>We propose DeepVideo-R1, a video large language model trained with Reg- GRPO and difficulty-aware data augmentation.
arXiv Detail & Related papers (2025-06-09T06:15:54Z) - ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL [54.100889131719626]
Chain-of-thought reasoning and reinforcement learning have driven breakthroughs in NLP.<n>We introduce ReasonGen-R1, a framework that imbues an autoregressive image generator with explicit text-based "thinking" skills.<n>We show that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models.
arXiv Detail & Related papers (2025-05-30T17:59:48Z) - Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization [1.1510009152620668]
Fine-tuning pre-trained generative models with Reinforcement Learning (RL) has emerged as an effective approach for aligning outputs with human preferences.<n>We show that RL-based fine-tuning is both efficient and effective for VAR models, benefiting particularly from their fast inference speeds.
arXiv Detail & Related papers (2025-05-29T10:45:38Z) - Policy Optimized Text-to-Image Pipeline Design [73.9633527029941]
We introduce a novel reinforcement learning-based framework for text-to-image generation.<n>Our approach first trains an ensemble of reward models capable of predicting image quality scores directly from prompt-workflow combinations.<n>We then implement a two-phase training strategy: initial vocabulary training followed by GRPO-based optimization.
arXiv Detail & Related papers (2025-05-27T17:50:47Z) - VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization [59.39976343879587]
VerIPO aims to gradually improve video LLMs' capacity for generating deep, long-term reasoning chains.<n>The training loop benefits from GRPO's expansive search and DPO's targeted optimization.<n>Our trained models exceed the direct inference of large-scale instruction-tuned Video-LLMs.
arXiv Detail & Related papers (2025-05-25T06:41:28Z) - DanceGRPO: Unleashing GRPO on Visual Generation [42.567425922760144]
Reinforcement Learning (RL) has emerged as a promising approach for fine-tuning generative models.<n>Existing methods like DDPO and DPOK face fundamental limitations when scaling to large and diverse prompt sets.<n>This paper presents DanceGRPO, a framework that addresses these limitations through an innovative adaptation of Group Relative Policy Optimization.
arXiv Detail & Related papers (2025-05-12T17:59:34Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [86.69947123512836]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - Improving Video Generation with Human Feedback [105.81833319891537]
We develop a systematic pipeline that harnesses human feedback to mitigate video generation problems.<n>We introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy.
arXiv Detail & Related papers (2025-01-23T18:55:41Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.