Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
- URL: http://arxiv.org/abs/2506.06440v1
- Date: Fri, 06 Jun 2025 18:00:46 GMT
- Title: Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
- Authors: Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu,
- Abstract summary: Vid2Sim is a generalizable video-based approach for recovering geometry and physical properties.<n>A feed-forward neural network trained to capture physical world knowledge reconstructs the observed configuration of the physical system from video.<n>A lightweight optimization pipeline refines the estimated appearance, geometry, and physical properties to closely align with video observations.
- Score: 41.17844925831194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Faithfully reconstructing textured shapes and physical properties from videos presents an intriguing yet challenging problem. Significant efforts have been dedicated to advancing such a system identification problem in this area. Previous methods often rely on heavy optimization pipelines with a differentiable simulator and renderer to estimate physical parameters. However, these approaches frequently necessitate extensive hyperparameter tuning for each scene and involve a costly optimization process, which limits both their practicality and generalizability. In this work, we propose a novel framework, Vid2Sim, a generalizable video-based approach for recovering geometry and physical properties through a mesh-free reduced simulation based on Linear Blend Skinning (LBS), offering high computational efficiency and versatile representation capability. Specifically, Vid2Sim first reconstructs the observed configuration of the physical system from video using a feed-forward neural network trained to capture physical world knowledge. A lightweight optimization pipeline then refines the estimated appearance, geometry, and physical properties to closely align with video observations within just a few minutes. Additionally, after the reconstruction, Vid2Sim enables high-quality, mesh-free simulation with high efficiency. Extensive experiments demonstrate that our method achieves superior accuracy and efficiency in reconstructing geometry and physical properties from video data.
Related papers
- DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos [21.441062722848265]
PhysTwin is a novel framework that uses sparse videos of dynamic objects under interaction to produce a photo- and physically realistic, real-time interactive replica.<n>Our approach centers on two key components: (1) a physics-informed representation that combines spring-mass models for realistic physical simulation, and generative shape models for geometry, and Gaussian splats for rendering.<n>Our method integrates an inverse physics framework with visual perception cues, enabling high-fidelity reconstruction even from partial, occluded, and limited viewpoints.
arXiv Detail & Related papers (2025-03-23T07:49:19Z) - GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.<n>We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.<n>In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting [32.846428862045634]
We present Sim Anything, a physics-based approach that endows static 3D objects with interactive dynamics.<n>Inspired by human visual reasoning, we propose MLLM-based Physical Property Perception.<n>We also simulate objects in an open-world scene with particles sampled via the Physical-Geometric Adaptive Sampling.
arXiv Detail & Related papers (2024-11-19T12:52:21Z) - PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - PhyRecon: Physically Plausible Neural Scene Reconstruction [81.73129450090684]
We introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations.
Central to this design is an efficient transformation between SDF-based implicit representations and explicit surface points.
Our results also exhibit superior physical stability in physical simulators, with at least a 40% improvement across all datasets.
arXiv Detail & Related papers (2024-04-25T15:06:58Z) - Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models [4.529832252085145]
We propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model.
Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence.
This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $phi$-SfT.
arXiv Detail & Related papers (2023-11-21T18:59:58Z) - NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input.
Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.