PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection
- URL: http://arxiv.org/abs/2511.03997v1
- Date: Thu, 06 Nov 2025 02:40:57 GMT
- Title: PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection
- Authors: Peiyao Wang, Weining Wang, Qi Li,
- Abstract summary: We propose PhysCorr, a unified framework for modeling, evaluating, and optimizing physical consistency in video generation.<n>Specifically, we introduce PhysicsRM, the first dual-dimensional reward model that quantifies both intra-object stability and inter-object interactions.<n>Our approach is model-agnostic and scalable, enabling seamless integration into a wide range of video diffusion and transformer-based backbones.
- Score: 10.498184571108995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in text-to-video generation have achieved impressive perceptual quality, yet generated content often violates fundamental principles of physical plausibility - manifesting as implausible object dynamics, incoherent interactions, and unrealistic motion patterns. Such failures hinder the deployment of video generation models in embodied AI, robotics, and simulation-intensive domains. To bridge this gap, we propose PhysCorr, a unified framework for modeling, evaluating, and optimizing physical consistency in video generation. Specifically, we introduce PhysicsRM, the first dual-dimensional reward model that quantifies both intra-object stability and inter-object interactions. On this foundation, we develop PhyDPO, a novel direct preference optimization pipeline that leverages contrastive feedback and physics-aware reweighting to guide generation toward physically coherent outputs. Our approach is model-agnostic and scalable, enabling seamless integration into a wide range of video diffusion and transformer-based backbones. Extensive experiments across multiple benchmarks demonstrate that PhysCorr achieves significant improvements in physical realism while preserving visual fidelity and semantic alignment. This work takes a critical step toward physically grounded and trustworthy video generation.
Related papers
- PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models [100.65199317765608]
Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation.<n>We introduce a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces.<n>We extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning.
arXiv Detail & Related papers (2026-01-16T08:40:10Z) - PAVAS: Physics-Aware Video-to-Audio Synthesis [58.746986798623084]
We present Physics-Aware Video-to-Audio Synthesis (PAVAS), a method that incorporates physical reasoning into a latent diffusion-based V2A generation.<n>We show that PAVAS produces physically plausible and perceptually coherent audio, outperforming existing V2A models in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-12-09T06:28:50Z) - ProPhy: Progressive Physical Alignment for Dynamic World Simulation [55.456455952212416]
ProPhy is a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation.<n>We show that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-05T09:39:26Z) - PhysHMR: Learning Humanoid Control Policies from Vision for Physically Plausible Human Motion Reconstruction [52.44375492811009]
We present PhysHMR, a unified framework that learns a visual-to-action policy for humanoid control in a physics-based simulator.<n>A key component of our approach is the pixel-as-ray strategy, which lifts 2D keypoints into 3D spatial rays and transforms them into global space.<n>PhysHMR produces high-fidelity, physically plausible motion across diverse scenarios, outperforming prior approaches in both visual accuracy and physical realism.
arXiv Detail & Related papers (2025-10-02T21:01:11Z) - Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation [80.89133198952187]
PhysHPO is a novel framework for Hierarchical Cross-Modal Direct Preference Optimization.<n>It enables fine-grained preference alignment for physically plausible video generation.<n>We show that PhysHPO significantly improves physical plausibility and overall video generation quality of advanced models.
arXiv Detail & Related papers (2025-08-14T17:30:37Z) - Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation [54.42523027597904]
We introduce a novel framework that integrates symbolic regression and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting.<n>Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories.
arXiv Detail & Related papers (2025-07-09T13:28:42Z) - Physics-Guided Motion Loss for Video Generation Model [8.083315267770255]
Current video diffusion models generate visually compelling content but often violate basic laws of physics.<n>We introduce a frequency-domain physics prior that improves motion plausibility without modifying model architectures.
arXiv Detail & Related papers (2025-06-02T20:42:54Z) - PhyMAGIC: Physical Motion-Aware Generative Inference with Confidence-guided LLM [17.554471769834453]
We present PhyMAGIC, a training-free framework that generates physically consistent motion from a single image.<n>PhyMAGIC integrates a pre-trained image-to-video diffusion model, confidence-guided reasoning via LLMs, and a differentiable physics simulator.<n> Comprehensive experiments demonstrate that PhyMAGIC outperforms state-of-the-art video generators and physics-aware baselines.
arXiv Detail & Related papers (2025-05-22T09:40:34Z) - VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos.<n>VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics.<n>We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z) - PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions.<n>Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.