Related papers: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

URL: http://arxiv.org/abs/2411.14423v1
Date: Thu, 21 Nov 2024 18:55:23 GMT
Title: Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Authors: Zhuoman Liu, Weicai Ye, Yan Luximon, Pengfei Wan, Di Zhang,
Abstract summary: We introduce a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation. This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios.
Score: 9.306758077479472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realistic simulation of dynamic scenes requires accurately capturing diverse material properties and modeling complex object interactions grounded in physical principles. However, existing methods are constrained to basic material types with limited predictable parameters, making them insufficient to represent the complexity of real-world materials. We introduce a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation. Our method utilizes multi-modal models to identify material types and initialize material parameters through image queries, while simultaneously inferring 3D Gaussian splats for detailed scene representation. We further refine these material parameters using video diffusion with a differentiable Material Point Method (MPM) and optical flow guidance rather than render loss or Score Distillation Sampling (SDS) loss. This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios, advancing both accuracy and flexibility in physics-based simulations.

Related papers

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM [14.522189177415724]
MAGIC is a training-free framework for single-image physical property inference and dynamic generation.<n>Our framework generates motion-rich videos from a static image and closes the visual-to-physical gap through a confidence-driven feedback loop.<n> Experiments show that MAGIC outperforms existing physics-aware generative methods in inference accuracy and achieves greater temporal coherence.
arXiv Detail & Related papers (2025-05-22T09:40:34Z)
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [53.70278210626701]
We propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images.<n>Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame.<n>We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches.
arXiv Detail & Related papers (2025-05-08T17:59:47Z)
PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image. Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z)
Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting [22.40115216094332]
We present Sim Anything, a physics-based approach that endows static 3D objects with interactive dynamics. Inspired by human visual reasoning, we propose MLLM-based Physical Property Perception. We also simulate objects in an open-world scene with particles sampled via the Physical-Geometric Adaptive Sampling.
arXiv Detail & Related papers (2024-11-19T12:52:21Z)
Differentiable Physics-based System Identification for Robotic Manipulation of Elastoplastic Materials [43.99845081513279]
This work introduces a novel Differentiable Physics-based System Identification (DPSI) framework that enables a robot arm to infer the physics parameters of elastoplastic materials and the environment. With only a single real-world interaction, the estimated parameters can accurately simulate visually and physically realistic behaviours.
arXiv Detail & Related papers (2024-11-01T13:04:25Z)
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method. It produces a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z)
Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation. It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z)
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion [35.71595369663293]
We propose textbfPhysics3D, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model. Experiments demonstrate the effectiveness of our method with both elastic and plastic materials.
arXiv Detail & Related papers (2024-06-06T17:59:47Z)
DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors. We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z)
PhyRecon: Physically Plausible Neural Scene Reconstruction [81.73129450090684]
We introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations. Central to this design is an efficient transformation between SDF-based implicit representations and explicit surface points. Our results also exhibit superior physical stability in physical simulators, with at least a 40% improvement across all datasets.
arXiv Detail & Related papers (2024-04-25T15:06:58Z)
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z)
Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z)
Neural Implicit Representations for Physical Parameter Inference from a Single Video [49.766574469284485]
We propose to combine neural implicit representations for appearance modeling with neural ordinary differential equations (ODEs) for modelling physical phenomena. Our proposed model combines several unique advantages: (i) Contrary to existing approaches that require large training datasets, we are able to identify physical parameters from only a single video. The use of neural implicit representations enables the processing of high-resolution videos and the synthesis of photo-realistic images.
arXiv Detail & Related papers (2022-04-29T11:55:35Z)
MoCo-Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary Monocular Cameras [98.40768911788854]
We introduce MoCo-Flow, a representation that models the dynamic scene using a 4D continuous time-variant function. At the heart of our work lies a novel optimization formulation, which is constrained by a motion consensus regularization on the motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity.
arXiv Detail & Related papers (2021-06-08T16:03:50Z)
gradSim: Differentiable simulation for system identification and visuomotor control [66.37288629125996]
We present gradSim, a framework that overcomes the dependence on 3D supervision by leveraging differentiable multiphysics simulation and differentiable rendering. Our unified graph enables learning in challenging visuomotor control tasks, without relying on state-based (3D) supervision.
arXiv Detail & Related papers (2021-04-06T16:32:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.