Related papers: Learning 3D-Gaussian Simulators from RGB Videos

Learning 3D-Gaussian Simulators from RGB Videos

URL: http://arxiv.org/abs/2503.24009v1
Date: Mon, 31 Mar 2025 12:33:59 GMT
Title: Learning 3D-Gaussian Simulators from RGB Videos
Authors: Mikel Zhobro, Andreas René Geist, Georg Martius,
Abstract summary: 3DGSim is a 3D physics simulator that learns object dynamics end-to-end from multi-view RGB videos.<n>It encodes images into a 3D Gaussian particle representation, propagates dynamics via a transformer, and renders frames using 3D Gaussian splatting.
Score: 20.250137125726265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning physics simulations from video data requires maintaining spatial and temporal consistency, a challenge often addressed with strong inductive biases or ground-truth 3D information -- limiting scalability and generalization. We introduce 3DGSim, a 3D physics simulator that learns object dynamics end-to-end from multi-view RGB videos. It encodes images into a 3D Gaussian particle representation, propagates dynamics via a transformer, and renders frames using 3D Gaussian splatting. By jointly training inverse rendering with a dynamics transformer using a temporal encoding and merging layer, 3DGSimembeds physical properties into point-wise latent vectors without enforcing explicit connectivity constraints. This enables the model to capture diverse physical behaviors, from rigid to elastic and cloth-like interactions, along with realistic lighting effects that also generalize to unseen multi-body interactions and novel scene edits.

Related papers

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
PIG: Physically-based Multi-Material Interaction with 3D Gaussians [14.097146027458368]
PIG: Physically-Based Multi-Material Interaction with 3D Gaussians is a novel approach that combines 3D object segmentation with the simulation of objects interacting in high precision.<n>We show that our method not only outperforms the state-of-the-art (SOTA) in terms of visual quality, but also opens up new directions and pipelines for the field of physically realistic scene generation.
arXiv Detail & Related papers (2025-06-09T11:25:21Z)
Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation [47.6666060652434]
We present an innovative framework that generates 3D models with accurate appearances and geometric structures.<n>By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects.
arXiv Detail & Related papers (2024-12-07T06:48:16Z)
Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling [10.247075501610492]
We introduce a framework to learn object dynamics directly from multi-view RGB videos. We train a particle-based dynamics model using Graph Neural Networks. Our method can predict object motions under varying initial configurations and unseen robot actions.
arXiv Detail & Related papers (2024-10-24T17:02:52Z)
GASP: Gaussian Splatting for Physic-Based Simulations [0.42881773214459123]
Existing physics models use additional meshing mechanisms, including triangle or tetrahedron meshing, marching cubes, or cage meshes. We modify the physics grounded Newtonian dynamics to align with 3D Gaussian components. Resulting solution can be integrated into any physics engine that can be treated as a black box.
arXiv Detail & Related papers (2024-09-09T17:28:57Z)
DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z)
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z)
Learning 3D Particle-based Simulators from RGB-D Videos [15.683877597215494]
We propose a method for learning simulators directly from observations. Visual Particle Dynamics (VPD) jointly learns a latent particle-based representation of 3D scenes. Unlike existing 2D video prediction models, VPD's 3D structure enables scene editing and long-term predictions.
arXiv Detail & Related papers (2023-12-08T20:45:34Z)
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis [58.5779956899918]
We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.
arXiv Detail & Related papers (2023-08-18T17:59:21Z)
3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes [68.66237114509264]
We present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes with fluids. We show our model can make long-horizon future predictions by learning from raw images and significantly outperforms models that do not employ an explicit 3D representation space.
arXiv Detail & Related papers (2023-04-22T19:28:49Z)
NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z)
Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering [101.56891506498755]
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks. We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets.
arXiv Detail & Related papers (2020-10-18T22:29:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.