Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
- URL: http://arxiv.org/abs/2602.00148v1
- Date: Thu, 29 Jan 2026 11:37:41 GMT
- Title: Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
- Authors: Shiqian Li, Ruihong Shen, Junfeng Ni, Chang Pan, Chi Zhang, Yixin Zhu,
- Abstract summary: We introduce an end-to-end neural framework that integrates 3D Gaussian perception with physics-based dynamic modeling to generate physically realistic 4D videos.<n>We also present GSCollision, a 4D Gaussian dataset featuring diverse materials, multi-object interactions, and complex scenes, totaling over 640k rendered physical videos.
- Score: 11.212256115568772
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Predicting physical dynamics from raw visual data remains a major challenge in AI. While recent video generation models have achieved impressive visual quality, they still cannot consistently generate physically plausible videos due to a lack of modeling of physical laws. Recent approaches combining 3D Gaussian splatting and physics engines can produce physically plausible videos, but are hindered by high computational costs in both reconstruction and simulation, and often lack robustness in complex real-world scenarios. To address these issues, we introduce Neural Gaussian Force Field (NGFF), an end-to-end neural framework that integrates 3D Gaussian perception with physics-based dynamic modeling to generate interactive, physically realistic 4D videos from multi-view RGB inputs, achieving two orders of magnitude faster than prior Gaussian simulators. To support training, we also present GSCollision, a 4D Gaussian dataset featuring diverse materials, multi-object interactions, and complex scenes, totaling over 640k rendered physical videos (~4 TB). Evaluations on synthetic and real 3D scenarios show NGFF's strong generalization and robustness in physical reasoning, advancing video prediction towards physics-grounded world models.
Related papers
- MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation [25.78198969054392]
MotionPhysics is an end-to-end differentiable framework that infers plausible physical parameters from a user-provided natural language prompt.<n>We evaluate MotionPhysics across more than thirty scenarios, including real-world, human-designed, and AI-generated 3D objects.
arXiv Detail & Related papers (2026-01-01T22:56:37Z) - PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis [52.905353023326306]
We propose PhysWorld, a framework that synthesizes physically plausible and diverse demonstrations to learn efficient world models.<n>Experiments show that PhysWorld has competitive performance while enabling inference speeds 47 times faster than the recent state-of-the-art method, i.e., PhysTwin.
arXiv Detail & Related papers (2025-10-24T13:25:39Z) - PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation [53.06495362038348]
Existing generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability.<n>We introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control.<n> Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos.
arXiv Detail & Related papers (2025-09-24T17:58:04Z) - PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis [37.21119648359889]
PhysGM is a feed-forward framework that jointly predicts a 3D Gaussian representation and its physical properties from a single image.<n>Our method effectively generates high-fidelity 4D simulations from a single image in one minute.
arXiv Detail & Related papers (2025-08-19T15:10:30Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - Learning 3D-Gaussian Simulators from RGB Videos [20.250137125726265]
3DGSim is a learned 3D simulator that learns physical interactions from multi-view RGB videos.<n>It unifies 3D scene reconstruction, particle dynamics prediction and video synthesis into an end-to-end trained framework.
arXiv Detail & Related papers (2025-03-31T12:33:59Z) - PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.