Related papers: 4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

URL: http://arxiv.org/abs/2406.10167v1
Date: Fri, 14 Jun 2024 16:38:00 GMT
Title: 4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations
Authors: Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit Bajaj, Qixing Huang,
Abstract summary: 4DRecons encodes the output as a 4D neural implicit surface. We show that 4DRecons can handle large deformations and complex inter-part interactions.
Score: 35.161541396566705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel approach 4DRecons that takes a single camera RGB-D sequence of a dynamic subject as input and outputs a complete textured deforming 3D model over time. 4DRecons encodes the output as a 4D neural implicit surface and presents an optimization procedure that combines a data term and two regularization terms. The data term fits the 4D implicit surface to the input partial observations. We address fundamental challenges in fitting a complete implicit surface to partial observations. The first regularization term enforces that the deformation among adjacent frames is as rigid as possible (ARAP). To this end, we introduce a novel approach to compute correspondences between adjacent textured implicit surfaces, which are used to define the ARAP regularization term. The second regularization term enforces that the topology of the underlying object remains fixed over time. This regularization is critical for avoiding self-intersections that are typical in implicit-based reconstructions. We have evaluated the performance of 4DRecons on a variety of datasets. Experimental results show that 4DRecons can handle large deformations and complex inter-part interactions and outperform state-of-the-art approaches considerably.

Related papers

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image. Our key insight is to distill pre-trained foundation models for consistent 4D scene representation. The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z)
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation [50.01520547454224]
Current generative models struggle to synthesize 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) We propose DiST-4D, which disentangles the problem into two diffusion processes: DiST-T, which predicts future metric depth and multi-view RGB sequences directly from past observations, and DiST-S, which enables spatial NVS by training only on existing viewpoints while enforcing cycle consistency. Experiments demonstrate that DiST-4D achieves state-of-the-art performance in both temporal prediction and NVS tasks, while also delivering competitive performance in planning-related evaluations.
arXiv Detail & Related papers (2025-03-19T13:49:48Z)
CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving [12.006435326659526]
We introduce a novel 4D Gaussian Splatting (4DGS) approach to improve dynamic scene rendering. Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians. By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation.
arXiv Detail & Related papers (2025-03-09T19:58:51Z)
Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z)
Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics. We use quadric assumption to rectify noisy depth estimations from RGB inputs. We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z)
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking [52.393359791978035]
Motion2VecSets is a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. We parameterize 4D dynamics with latent sets instead of using global latent codes. For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames.
arXiv Detail & Related papers (2024-01-12T15:05:08Z)
LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling [69.56581851211841]
We propose a novel Local 4D implicit Representation for Dynamic clothed human, named LoRD. Our key insight is to encourage the network to learn the latent codes of local part-level representation. LoRD has strong capability for representing 4D human, and outperforms state-of-the-art methods on practical applications.
arXiv Detail & Related papers (2022-08-18T03:49:44Z)
Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model [76.64071133839862]
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications. Our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering. Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
arXiv Detail & Related papers (2022-06-16T17:59:54Z)
4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface [7.637832293935966]
We introduce 4DComplete, a novel data-driven approach that estimates the non-rigid motion for the unobserved geometry. For network training, we constructed a large-scale synthetic dataset called DeformingThings4D.
arXiv Detail & Related papers (2021-05-05T07:39:12Z)
Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction [43.60322886598972]
This paper focuses on the task of 4D shape reconstruction from a sequence of point clouds. We present a novel pipeline to learn a temporal evolution of the 3D human shape through capturing continuous transformation functions among cross-frame occupancy fields.
arXiv Detail & Related papers (2021-03-30T13:36:03Z)
Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates. We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.