Learning Dynamic View Synthesis With Few RGBD Cameras
- URL: http://arxiv.org/abs/2204.10477v1
- Date: Fri, 22 Apr 2022 03:17:35 GMT
- Title: Learning Dynamic View Synthesis With Few RGBD Cameras
- Authors: Shengze Wang, YoungJoong Kwon, Yuan Shen, Qian Zhang, Andrei State,
Jia-Bin Huang, Henry Fuchs
- Abstract summary: We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
- Score: 60.36357774688289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There have been significant advancements in dynamic novel view synthesis in
recent years. However, current deep learning models often require (1) prior
models (e.g., SMPL human models), (2) heavy pre-processing, or (3) per-scene
optimization. We propose to utilize RGBD cameras to remove these limitations
and synthesize free-viewpoint videos of dynamic indoor scenes. We generate
feature point clouds from RGBD frames and then render them into free-viewpoint
videos via a neural renderer. However, the inaccurate, unstable, and incomplete
depth measurements induce severe distortions, flickering, and ghosting
artifacts. We enforce spatial-temporal consistency via the proposed Cycle
Reconstruction Consistency and Temporal Stabilization module to reduce these
artifacts. We introduce a simple Regional Depth-Inpainting module that
adaptively inpaints missing depth values to render complete novel views.
Additionally, we present a Human-Things Interactions dataset to validate our
approach and facilitate future research. The dataset consists of 43 multi-view
RGBD video sequences of everyday activities, capturing complex interactions
between human subjects and their surroundings. Experiments on the HTI dataset
show that our method outperforms the baseline per-frame image fidelity and
spatial-temporal consistency. We will release our code, and the dataset on the
website soon.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video [25.551944406980297]
We propose a novel approach to generate high-quality novel views from monocular videos of complex and dynamic scenes.
We introduce a module that operates in both the time and frequency domains to aggregate the features of object motion.
Our experiments demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2024-01-10T00:40:05Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - RGB-D Mapping and Tracking in a Plenoxel Radiance Field [5.239559610798646]
We present the vital differences between view synthesis models and 3D reconstruction models.
We also comment on why a depth sensor is essential for modeling accurate geometry in general outward-facing scenes.
Our method achieves state-of-the-art results in both mapping and tracking tasks, while also being faster than competing neural network-based approaches.
arXiv Detail & Related papers (2023-07-07T06:05:32Z) - NSLF-OL: Online Learning of Neural Surface Light Fields alongside
Real-time Incremental 3D Reconstruction [0.76146285961466]
The paper proposes a novel Neural Surface Light Fields model that copes with the small range of view directions while producing a good result in unseen directions.
Our model learns online the Neural Surface Light Fields (NSLF) aside from real-time 3D reconstruction with a sequential data stream as the shared input.
In addition to online training, our model also provides real-time rendering after completing the data stream for visualization.
arXiv Detail & Related papers (2023-04-29T15:41:15Z) - DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes [27.37830742693236]
We present DeVRF, a novel representation to accelerate learning dynamic radiance fields.
Experiments demonstrate that DeVRF achieves two orders of magnitude speedup with on-par high-fidelity results.
arXiv Detail & Related papers (2022-05-31T12:13:54Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - Class-agnostic Reconstruction of Dynamic Objects from Videos [127.41336060616214]
We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos.
We develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues.
Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation.
arXiv Detail & Related papers (2021-12-03T18:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.