Related papers: Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

URL: http://arxiv.org/abs/2311.12796v3
Date: Mon, 15 Apr 2024 11:40:39 GMT
Title: Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
Authors: David Stotko, Nils Wandel, Reinhard Klein,
Abstract summary: We propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $phi$-SfT.
Score: 4.529832252085145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D reconstruction of dynamic scenes is a long-standing problem in computer graphics and increasingly difficult the less information is available. Shape-from-Template (SfT) methods aim to reconstruct a template-based geometry from RGB images or video sequences, often leveraging just a single monocular camera without depth information, such as regular smartphone recordings. Unfortunately, existing reconstruction methods are either unphysical and noisy or slow in optimization. To solve this problem, we propose a novel SfT reconstruction algorithm for cloth using a pre-trained neural surrogate model that is fast to evaluate, stable, and produces smooth reconstructions due to a regularizing physics simulation. Differentiable rendering of the simulated mesh enables pixel-wise comparisons between the reconstruction and a target video sequence that can be used for a gradient-based optimization procedure to extract not only shape information but also physical parameters such as stretching, shearing, or bending stiffness of the cloth. This allows to retain a precise, stable, and smooth reconstructed geometry while reducing the runtime by a factor of 400-500 compared to $\phi$-SfT, a state-of-the-art physics-based SfT approach.

Related papers

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z)
Learning Multi-frame and Monocular Prior for Estimating Geometry in Dynamic Scenes [56.936178608296906]
We present a new model, coined MMP, to estimate the geometry in a feed-forward manner.<n>Based on the recent Siamese architecture, we introduce a new trajectory encoding module.<n>We find MMP can achieve state-of-the-art quality in feed-forward pointmap prediction.
arXiv Detail & Related papers (2025-05-03T08:28:15Z)
PhysMotion: Physics-Grounded Dynamics From a Single Image [24.096925413047217]
We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image. Our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions.
arXiv Detail & Related papers (2024-11-26T07:59:11Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene [25.164085646259856]
This paper introduces a 3D semantic NeRF for dynamic scenes captured from sparse or singleview RGB videos. Our framework uses an Invertible Neural Network (INN) for LBS prediction, the training process. Our approach produces high-quality reconstructions of both deformable and non-deformable objects in complex interactions.
arXiv Detail & Related papers (2024-09-26T01:34:42Z)
SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes [75.9110646062442]
We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
arXiv Detail & Related papers (2023-08-16T09:50:35Z)
NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z)
{\phi}-SfT: Shape-from-Template with a Physics-Based Deformation Model [69.27632025495512]
Shape-from-Template (SfT) methods estimate 3D surface deformations from a single monocular RGB camera. This paper proposes a new SfT approach explaining 2D observations through physical simulations.
arXiv Detail & Related papers (2022-03-22T17:59:57Z)
SIDER: Single-Image Neural Optimization for Facial Geometric Detail Recovery [54.64663713249079]
SIDER is a novel photometric optimization method that recovers detailed facial geometry from a single image in an unsupervised manner. In contrast to prior work, SIDER does not rely on any dataset priors and does not require additional supervision from multiple views, lighting changes or ground truth 3D shape.
arXiv Detail & Related papers (2021-08-11T22:34:53Z)
LASR: Learning Articulated Shape Reconstruction from a Monocular Video [97.92849567637819]
We introduce a template-free approach to learn 3D shapes from a single video. Our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes.
arXiv Detail & Related papers (2021-05-06T21:41:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.