Fast Non-Rigid Radiance Fields from Monocularized Data
- URL: http://arxiv.org/abs/2212.01368v2
- Date: Mon, 13 Nov 2023 14:31:18 GMT
- Title: Fast Non-Rigid Radiance Fields from Monocularized Data
- Authors: Moritz Kappel, Vladislav Golyanik, Susana Castillo, Christian
Theobalt, Marcus Magnor
- Abstract summary: This paper proposes a new method for full 360deg inward-facing novel view synthesis of non-rigidly deforming scenes.
At the core of our method are 1) An efficient deformation module that decouples the processing of spatial and temporal information for accelerated training and inference; and 2) A static module representing the canonical scene as a fast hash-encoded neural radiance field.
In both cases, our method is significantly faster than previous methods, converging in less than 7 minutes and achieving real-time framerates at 1K resolution, while obtaining a higher visual accuracy for generated novel views.
- Score: 66.74229489512683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The reconstruction and novel view synthesis of dynamic scenes recently gained
increased attention. As reconstruction from large-scale multi-view data
involves immense memory and computational requirements, recent benchmark
datasets provide collections of single monocular views per timestamp sampled
from multiple (virtual) cameras. We refer to this form of inputs as
"monocularized" data. Existing work shows impressive results for synthetic
setups and forward-facing real-world data, but is often limited in the training
speed and angular range for generating novel views. This paper addresses these
limitations and proposes a new method for full 360{\deg} inward-facing novel
view synthesis of non-rigidly deforming scenes. At the core of our method are:
1) An efficient deformation module that decouples the processing of spatial and
temporal information for accelerated training and inference; and 2) A static
module representing the canonical scene as a fast hash-encoded neural radiance
field. In addition to existing synthetic monocularized data, we systematically
analyze the performance on real-world inward-facing scenes using a newly
recorded challenging dataset sampled from a synchronized large-scale multi-view
rig. In both cases, our method is significantly faster than previous methods,
converging in less than 7 minutes and achieving real-time framerates at 1K
resolution, while obtaining a higher visual accuracy for generated novel views.
Our source code and data is available at our project page
https://graphics.tu-bs.de/publications/kappel2022fast.
Related papers
- D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - Fast View Synthesis of Casual Videos with Soup-of-Planes [24.35962788109883]
Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax.
This paper revisits explicit video representations to synthesize high-quality novel views from a monocular video efficiently.
Our method can render high-quality novel views from an in-the-wild video with comparable quality to state-of-the-art methods while being 100x faster in training and enabling real-time rendering.
arXiv Detail & Related papers (2023-12-04T18:55:48Z) - Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids [84.90863397388776]
We propose to directly use signed distance function (SDF) in sparse voxel block grids for fast and accurate scene reconstruction without distances.
Our globally sparse and locally dense data structure exploits surfaces' spatial sparsity, enables cache-friendly queries, and allows direct extensions to multi-modal data.
Experiments show that our approach is 10x faster in training and 100x faster in rendering while achieving comparable accuracy to state-of-the-art neural implicit methods.
arXiv Detail & Related papers (2023-05-22T16:50:19Z) - Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair.
Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry.
We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z) - Cascaded and Generalizable Neural Radiance Fields for Fast View
Synthesis [35.035125537722514]
We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis.
We first train CG-NeRF on multiple 3D scenes of the DTU dataset.
We show that CG-NeRF outperforms state-of-the-art generalizable neural rendering methods on various synthetic and real datasets.
arXiv Detail & Related papers (2022-08-09T12:23:48Z) - RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis [104.53930611219654]
We present a large-scale synthetic dataset for novel view synthesis consisting of 300k images rendered from nearly 2000 complex scenes.
The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis.
Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures.
arXiv Detail & Related papers (2022-05-14T13:15:32Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.