Learning 3D-Gaussian Simulators from RGB Videos
- URL: http://arxiv.org/abs/2503.24009v2
- Date: Sun, 10 Aug 2025 15:15:08 GMT
- Title: Learning 3D-Gaussian Simulators from RGB Videos
- Authors: Mikel Zhobro, Andreas René Geist, Georg Martius,
- Abstract summary: 3DGSim is a learned 3D simulator that learns physical interactions from multi-view RGB videos.<n>It unifies 3D scene reconstruction, particle dynamics prediction and video synthesis into an end-to-end trained framework.
- Score: 20.250137125726265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realistic simulation is critical for applications ranging from robotics to animation. Learned simulators have emerged as a possibility to capture real world physics directly from video data, but very often require privileged information such as depth information, particle tracks and hand-engineered features to maintain spatial and temporal consistency. These strong inductive biases or ground truth 3D information help in domains where data is sparse but limit scalability and generalization in data rich regimes. To overcome the key limitations, we propose 3DGSim, a learned 3D simulator that directly learns physical interactions from multi-view RGB videos. 3DGSim unifies 3D scene reconstruction, particle dynamics prediction and video synthesis into an end-to-end trained framework. It adopts MVSplat to learn a latent particle-based representation of 3D scenes, a Point Transformer for particle dynamics, a Temporal Merging module for consistent temporal aggregation and Gaussian Splatting to produce novel view renderings. By jointly training inverse rendering and dynamics forecasting, 3DGSim embeds the physical properties into point-wise latent features. This enables the model to capture diverse physical behaviors, from rigid to elastic, cloth-like dynamics, and boundary conditions (e.g. fixed cloth corner), along with realistic lighting effects that also generalize to unseen multibody interactions and novel scene edits.
Related papers
- Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields [11.212256115568772]
We introduce an end-to-end neural framework that integrates 3D Gaussian perception with physics-based dynamic modeling to generate physically realistic 4D videos.<n>We also present GSCollision, a 4D Gaussian dataset featuring diverse materials, multi-object interactions, and complex scenes, totaling over 640k rendered physical videos.
arXiv Detail & Related papers (2026-01-29T11:37:41Z) - Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation [87.91642226587294]
Current learning-based 3D reconstruction methods rely on the availability of captured real-world multi-view data.<n>We propose a self-distillation framework that distills the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation.<n>Our framework achieves state-of-the-art performance in static and dynamic 3D scene generation.
arXiv Detail & Related papers (2025-09-23T17:58:01Z) - TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos [7.616167860385134]
We propose a new framework named TRACE to model the motion physics of complex dynamic 3D scenes.<n>By formulating each 3D point as a rigid particle with size and orientation in space, we directly learn a translation rotation dynamics system for each particle.
arXiv Detail & Related papers (2025-08-13T13:43:01Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - PIG: Physically-based Multi-Material Interaction with 3D Gaussians [14.097146027458368]
PIG: Physically-Based Multi-Material Interaction with 3D Gaussians is a novel approach that combines 3D object segmentation with the simulation of objects interacting in high precision.<n>We show that our method not only outperforms the state-of-the-art (SOTA) in terms of visual quality, but also opens up new directions and pipelines for the field of physically realistic scene generation.
arXiv Detail & Related papers (2025-06-09T11:25:21Z) - Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation [47.6666060652434]
We present an innovative framework that generates 3D models with accurate appearances and geometric structures.<n>By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects.
arXiv Detail & Related papers (2024-12-07T06:48:16Z) - Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting [32.846428862045634]
We present Sim Anything, a physics-based approach that endows static 3D objects with interactive dynamics.<n>Inspired by human visual reasoning, we propose MLLM-based Physical Property Perception.<n>We also simulate objects in an open-world scene with particles sampled via the Physical-Geometric Adaptive Sampling.
arXiv Detail & Related papers (2024-11-19T12:52:21Z) - Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling [10.247075501610492]
We introduce a framework to learn object dynamics directly from multi-view RGB videos.
We train a particle-based dynamics model using Graph Neural Networks.
Our method can predict object motions under varying initial configurations and unseen robot actions.
arXiv Detail & Related papers (2024-10-24T17:02:52Z) - GASP: Gaussian Splatting for Physic-Based Simulations [0.42881773214459123]
Existing physics models use additional meshing mechanisms, including triangle or tetrahedron meshing, marching cubes, or cage meshes.
We modify the physics grounded Newtonian dynamics to align with 3D Gaussian components.
Resulting solution can be integrated into any physics engine that can be treated as a black box.
arXiv Detail & Related papers (2024-09-09T17:28:57Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - Learning 3D Particle-based Simulators from RGB-D Videos [15.683877597215494]
We propose a method for learning simulators directly from observations.
Visual Particle Dynamics (VPD) jointly learns a latent particle-based representation of 3D scenes.
Unlike existing 2D video prediction models, VPD's 3D structure enables scene editing and long-term predictions.
arXiv Detail & Related papers (2023-12-08T20:45:34Z) - Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis [58.5779956899918]
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements.
We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians.
We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.
arXiv Detail & Related papers (2023-08-18T17:59:21Z) - 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive
Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids.
We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z) - NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input.
Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Real-time Deep Dynamic Characters [95.5592405831368]
We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance.
We use a novel graph convolutional network architecture to enable motion-dependent deformation learning of body and clothing.
We show that our model creates motion-dependent surface deformations, physically plausible dynamic clothing deformations, as well as video-realistic surface textures at a much higher level of detail than previous state of the art approaches.
arXiv Detail & Related papers (2021-05-04T23:28:55Z) - Image GANs meet Differentiable Rendering for Inverse Graphics and
Interpretable 3D Neural Rendering [101.56891506498755]
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks.
We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets.
arXiv Detail & Related papers (2020-10-18T22:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.