Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
- URL: http://arxiv.org/abs/2507.06830v1
- Date: Wed, 09 Jul 2025 13:28:42 GMT
- Title: Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
- Authors: Tao Feng, Xianbing Zhao, Zhenhua Chen, Tien Tsin Wong, Hamid Rezatofighi, Gholamreza Haffari, Lizhen Qu,
- Abstract summary: We introduce a novel framework that integrates symbolic regression and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting.<n>Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories.
- Score: 54.42523027597904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion-based and autoregressive video generation models have achieved remarkable visual realism. However, these models typically lack accurate physical alignment, failing to replicate real-world dynamics in object motion. This limitation arises primarily from their reliance on learned statistical correlations rather than capturing mechanisms adhering to physical laws. To address this issue, we introduce a novel framework that integrates symbolic regression (SR) and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting. Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories. These trajectories then guide video generation without requiring fine-tuning of existing models. Evaluated on scenarios in Classical Mechanics, including spring-mass, pendulums, and projectile motions, our method successfully recovers ground-truth analytical equations and improves the physical alignment of generated videos over baseline methods.
Related papers
- MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM [14.522189177415724]
MAGIC is a training-free framework for single-image physical property inference and dynamic generation.<n>Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven feedback loop.<n> Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence.
arXiv Detail & Related papers (2025-05-22T09:40:34Z) - RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism [73.38167494118746]
We propose a framework to improve the realism of motion in generated videos.<n>We advocate for the incorporation of a retrieval mechanism during the generation phase.<n>Our pipeline is designed to apply to any text-to-video diffusion model.
arXiv Detail & Related papers (2025-04-09T08:14:05Z) - VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos.<n>VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics.<n>We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z) - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.<n>VideoJAM achieves state-of-the-art performance in motion coherence.<n>These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos [6.093379844890164]
We propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting.<n>A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion.<n>The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics.
arXiv Detail & Related papers (2024-10-10T10:24:59Z) - Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems [49.11170948406405]
We propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos.<n>We take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems.
arXiv Detail & Related papers (2024-10-02T09:44:54Z) - Kinematics-aware Trajectory Generation and Prediction with Latent Stochastic Differential Modeling [12.338614299403305]
Trajectory generation and trajectory prediction are critical tasks in autonomous driving.
Deep learning-based methods have shown great promise for these two tasks in learning various traffic scenarios.
However, it remains a challenging problem for these methods to ensure that the generated/predicted trajectories are physically realistic.
arXiv Detail & Related papers (2023-09-17T16:06:38Z) - PETAL: Physics Emulation Through Averaged Linearizations for Solving
Inverse Problems [0.6039786064227648]
Inverse problems describe the task of recovering an underlying signal of interest given observables.
We propose a simple learned weighted average model that embeds linearizations of the forward model around various reference points into the model itself.
arXiv Detail & Related papers (2023-05-18T15:50:54Z) - Evaluation of Differentially Constrained Motion Models for Graph-Based
Trajectory Prediction [1.1947990549568765]
This research investigates the performance of various motion models in combination with numerical solvers for the prediction task.
The study shows that simpler models, such as low-order integrator models, are preferred over more complex, e.g., kinematic models, to achieve accurate predictions.
arXiv Detail & Related papers (2023-04-11T10:15:20Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z) - AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points [92.91569287889203]
We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction.
To better capture the moving objects in videos, we introduce dynamic points.
We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
arXiv Detail & Related papers (2020-07-11T08:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.