IPO: Iterative Preference Optimization for Text-to-Video Generation
- URL: http://arxiv.org/abs/2502.02088v2
- Date: Wed, 05 Feb 2025 06:18:12 GMT
- Title: IPO: Iterative Preference Optimization for Text-to-Video Generation
- Authors: Xiaomeng Yang, Zhiyu Tan, Xuecheng Nie, Hao Li,
- Abstract summary: We introduce an Iterative Preference Optimization strategy to enhance generated video quality by incorporating human feedback.
IPO exploits a critic model to justify video generations for pairwise ranking as in Direct Preference Optimization or point-wise scoring.
In addition, IPO incorporates the critic model with the multi-modality large language model, which enables it to automatically assign preference labels without need of retraining or relabeling.
- Score: 15.763879468841818
- License:
- Abstract: Video foundation models have achieved significant advancement with the help of network upgrade as well as model scale-up. However, they are still hard to meet requirements of applications due to unsatisfied generation quality. To solve this problem, we propose to align video foundation models with human preferences from the perspective of post-training in this paper. Consequently, we introduce an Iterative Preference Optimization strategy to enhance generated video quality by incorporating human feedback. Specifically, IPO exploits a critic model to justify video generations for pairwise ranking as in Direct Preference Optimization or point-wise scoring as in Kahneman-Tversky Optimization. Given this, IPO optimizes video foundation models with guidance of signals from preference feedback, which helps improve generated video quality in subject consistency, motion smoothness and aesthetic quality, etc. In addition, IPO incorporates the critic model with the multi-modality large language model, which enables it to automatically assign preference labels without need of retraining or relabeling. In this way, IPO can efficiently perform multi-round preference optimization in an iterative manner, without the need of tediously manual labeling. Comprehensive experiments demonstrate that the proposed IPO can effectively improve the video generation quality of a pretrained model and help a model with only 2B parameters surpass the one with 5B parameters. Besides, IPO achieves new state-of-the-art performance on VBench benchmark. We will release our source codes, models as well as dataset to advance future research and applications.
Related papers
- DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization [75.55167570591063]
We propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process.
DreamDPO reduces reliance on precise pointwise quality evaluations while enabling fine-grained controllability.
Experiments demonstrate that DreamDPO achieves competitive results, and provides higher-quality and more controllable 3D content.
arXiv Detail & Related papers (2025-02-05T11:03:08Z) - Improving Video Generation with Human Feedback [81.48120703718774]
Video generation has achieved significant advances, but issues like unsmooth motion and misalignment between videos and prompts persist.
We develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model.
We introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy.
arXiv Detail & Related papers (2025-01-23T18:55:41Z) - OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization [30.6130504613716]
We introduce OnlineVPO, a preference learning approach tailored specifically for video diffusion models.
By employing the video reward model to offer concise video feedback on the fly, OnlineVPO offers effective and efficient preference guidance.
arXiv Detail & Related papers (2024-12-19T18:34:50Z) - Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM [54.2320450886902]
Text-to-video models have made remarkable advancements through optimization on high-quality text-video pairs.
Current automatic methods for refining prompts encounter challenges such as Modality-Inconsistency, Cost-Discrepancy, and Model-Unaware.
We introduce Prompt-A-Video, which excels in crafting Video-Centric, Labor-Free and Preference-Aligned prompts tailored to specific video diffusion model.
arXiv Detail & Related papers (2024-12-19T18:32:21Z) - VideoDPO: Omni-Preference Alignment for Video Diffusion Generation [48.36302380755874]
Direct Preference Optimization (DPO) has demonstrated significant improvements in language and image generation.
We propose a VideoDPO pipeline by making several key adjustments.
Our experiments demonstrate substantial improvements in both visual quality and semantic alignment.
arXiv Detail & Related papers (2024-12-18T18:59:49Z) - MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples [22.521746860874305]
This study introduces the MPPO algorithm, which leverages the average likelihood of model responses to fit the reward function.
Through a comparison of Point-wise, Pair-wise, and List-wise implementations, we found that the Pair-wise approach achieves the best performance.
Experimental results demonstrate MPPO's outstanding performance across various benchmarks.
arXiv Detail & Related papers (2024-12-13T14:18:58Z) - Scalable Ranked Preference Optimization for Text-to-Image Generation [76.16285931871948]
We investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training.
The preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process.
We introduce RankDPO to enhance DPO-based methods using the ranking feedback.
arXiv Detail & Related papers (2024-10-23T16:42:56Z) - ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO [36.69910114305134]
We propose Iterative Self-Retrospective Direct Preference Optimization (ISR-DPO) to enhance preference modeling.
ISR-DPO enhances the self-judge's focus on informative video regions, resulting in more visually grounded preferences.
In extensive empirical evaluations, the ISR-DPO significantly outperforms the state of the art.
arXiv Detail & Related papers (2024-06-17T07:33:30Z) - Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward [118.65089648651308]
This paper introduces a novel framework that utilizes detailed video captions as a proxy of video content.
We show that applying this tailored reward through DPO significantly improves the performance of video LMMs on video Question Answering (QA) tasks.
arXiv Detail & Related papers (2024-04-01T17:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.