SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video
- URL: http://arxiv.org/abs/2509.08828v1
- Date: Wed, 10 Sep 2025 17:59:57 GMT
- Title: SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video
- Authors: David Stotko, Reinhard Klein,
- Abstract summary: In this paper, we propose a novel approach that combines the domains of 3D geometry reconstruction and appearance estimation for physically based rendering.<n>We present a system that is able to perform both tasks for fabrics, utilizing only a single monocular RGB video sequence as input.<n>In comparison with the most recent methods in the field, we have reduced the error in the 3D reconstruction by a factor of 2.64 while requiring a medium runtime of 30 min per scene.
- Score: 6.408363851409316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The reconstruction of three-dimensional dynamic scenes is a well-established yet challenging task within the domain of computer vision. In this paper, we propose a novel approach that combines the domains of 3D geometry reconstruction and appearance estimation for physically based rendering and present a system that is able to perform both tasks for fabrics, utilizing only a single monocular RGB video sequence as input. In order to obtain realistic and high-quality deformations and renderings, a physical simulation of the cloth geometry and differentiable rendering are employed. In this paper, we introduce two novel regularization terms for the 3D reconstruction task that improve the plausibility of the reconstruction by addressing the depth ambiguity problem in monocular video. In comparison with the most recent methods in the field, we have reduced the error in the 3D reconstruction by a factor of 2.64 while requiring a medium runtime of 30 min per scene. Furthermore, the optimized motion achieves sufficient quality to perform an appearance estimation of the deforming object, recovering sharp details from this single monocular RGB video.
Related papers
- VideoArtGS: Building Digital Twins of Articulated Objects from Monocular Video [60.63575135514847]
Building digital twins of articulated objects from monocular video presents an essential challenge in computer vision.<n>We introduce VideoArtGS, a novel approach that reconstructs high-fidelity digital twins of articulated objects from monocular video.<n>VideoArtGS demonstrates state-of-the-art performance in articulation and mesh reconstruction, reducing the reconstruction error by about two orders of magnitude compared to existing methods.
arXiv Detail & Related papers (2025-09-22T11:52:02Z) - 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction [62.84879632157956]
We propose a novel hybrid 2D/3D representation that jointly optimize constrained planar (2D) Gaussians for modeling flat surfaces and freeform (3D) Gaussians for the rest of the scene.<n>Our end-to-end approach dynamically detects and refines planar regions, improving both visual fidelity and geometric accuracy.<n>It achieves state-of-the-art depth estimation on ScanNet++ and ScanNetv2, and excels at mesh extraction without overfitting to a specific camera model.
arXiv Detail & Related papers (2025-09-19T21:04:36Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture [47.44029968307207]
We propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images.
Our approach utilizes the proposed Single-view neural implicit Shape and Radiance field (SSR) representations to leverage both explicit 3D shape supervision and volume rendering.
A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model.
arXiv Detail & Related papers (2023-11-01T11:46:15Z) - NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos [82.74918564737591]
We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input.
Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches.
arXiv Detail & Related papers (2022-10-22T04:57:55Z) - MonoNeuralFusion: Online Monocular Neural 3D Reconstruction with
Geometric Priors [41.228064348608264]
This paper introduces a novel neural implicit scene representation with volume rendering for high-fidelity online 3D scene reconstruction from monocular videos.
For fine-grained reconstruction, our key insight is to incorporate geometric priors into both the neural implicit scene representation and neural volume rendering.
MonoNeuralFusion consistently generates much better complete and fine-grained reconstruction results, both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-09-30T00:44:26Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model [76.64071133839862]
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications.
Our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering.
Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
arXiv Detail & Related papers (2022-06-16T17:59:54Z) - MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface
Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views.
This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints.
Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.