RDPO: Real Data Preference Optimization for Physics Consistency Video Generation
- URL: http://arxiv.org/abs/2506.18655v1
- Date: Mon, 23 Jun 2025 13:55:24 GMT
- Title: RDPO: Real Data Preference Optimization for Physics Consistency Video Generation
- Authors: Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, Anxiang Zeng,
- Abstract summary: We present Real Data Preference Optimisation (RDPO), an annotation-free framework that distills physical priors directly from real-world videos.<n>RDPO reverse-samples real video sequences with a pre-trained generator to automatically build preference pairs that are distinguishable in terms of physical correctness.<n>A multi-stage iterative training schedule guides the generator to obey physical laws increasingly well.
- Score: 24.842288734103505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video generation techniques have achieved remarkable advancements in visual quality, yet faithfully reproducing real-world physics remains elusive. Preference-based model post-training may improve physical consistency, but requires costly human-annotated datasets or reward models that are not yet feasible. To address these challenges, we present Real Data Preference Optimisation (RDPO), an annotation-free framework that distills physical priors directly from real-world videos. Specifically, the proposed RDPO reverse-samples real video sequences with a pre-trained generator to automatically build preference pairs that are statistically distinguishable in terms of physical correctness. A multi-stage iterative training schedule then guides the generator to obey physical laws increasingly well. Benefiting from the dynamic information explored from real videos, our proposed RDPO significantly improves the action coherence and physical realism of the generated videos. Evaluations on multiple benchmarks and human evaluations have demonstrated that RDPO achieves improvements across multiple dimensions. The source code and demonstration of this paper are available at: https://wwenxu.github.io/RDPO/
Related papers
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency [56.475612147721264]
We propose a dual-reward formulation that supervises both semantic and temporal reasoning through discrete and continuous reward signals.<n>We evaluate our approach across eight representative video understanding tasks, including VideoQA, Temporal Video Grounding, and Grounded VideoQA.<n>Results underscore the importance of reward design and data selection in advancing reasoning-centric video understanding with MLLMs.
arXiv Detail & Related papers (2025-06-02T17:28:26Z) - Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics [68.85010825225528]
Video datasets present unique challenges due to the presence of temporal information and varying levels of redundancy across different classes.<n>Existing DD approaches assume a uniform level of temporal redundancy across all different video semantics, which limits their effectiveness on video datasets.<n>We propose Dynamic-Aware Video Distillation (DAViD), a Reinforcement Learning (RL) approach to predict the optimal Temporal Resolution of the synthetic videos.
arXiv Detail & Related papers (2025-05-28T11:43:58Z) - Learning from Streaming Video with Orthogonal Gradients [62.51504086522027]
We address the challenge of representation learning from a continuous stream of video as input, in a self-supervised manner.<n>This differs from the standard approaches to video learning where videos are chopped and shuffled during training in order to create a non-redundant batch.<n>We demonstrate the drop in performance when moving from shuffled to sequential learning on three tasks.
arXiv Detail & Related papers (2025-04-02T17:59:57Z) - VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation [66.58048825989239]
VideoPhy-2 is an action-centric dataset for evaluating physical commonsense in generated videos.<n>We perform human evaluation that assesses semantic adherence, physical commonsense, and grounding of physical rules in the generated videos.<n>Our findings reveal major shortcomings, with even the best model achieving only 22% joint performance.
arXiv Detail & Related papers (2025-03-09T22:49:12Z) - A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction [2.5262441079541285]
We introduce a benchmark designed specifically to assess the Physical Coherence of generated videos, PhyCoBench.<n>Our benchmark includes 120 prompts covering 7 categories of physical principles, capturing key physical laws observable in video content.<n>We propose an automated evaluation model: PhyCoPredictor, a diffusion model that generates optical flow and video frames in a cascade manner.
arXiv Detail & Related papers (2025-02-08T09:31:26Z) - Improving Video Generation with Human Feedback [81.48120703718774]
Video generation has achieved significant advances, but issues like unsmooth motion and misalignment between videos and prompts persist.<n>We develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model.<n>We introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy.
arXiv Detail & Related papers (2025-01-23T18:55:41Z) - Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems [49.11170948406405]
We propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos.<n>We take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems.
arXiv Detail & Related papers (2024-10-02T09:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.