Incremental Dense Reconstruction from Monocular Video with Guided Sparse
Feature Volume Fusion
- URL: http://arxiv.org/abs/2305.14918v1
- Date: Wed, 24 May 2023 09:06:01 GMT
- Title: Incremental Dense Reconstruction from Monocular Video with Guided Sparse
Feature Volume Fusion
- Authors: Xingxing Zuo, Nan Yang, Nathaniel Merrill, Binbin Xu, Stefan
Leutenegger
- Abstract summary: This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF values from a novel sparsified deep feature volume.
An uncertainty-aware multi-view stereo network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume.
Our method is shown to produce more complete reconstructions with finer detail in many cases.
- Score: 23.984073189849024
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Incrementally recovering 3D dense structures from monocular videos is of
paramount importance since it enables various robotics and AR applications.
Feature volumes have recently been shown to enable efficient and accurate
incremental dense reconstruction without the need to first estimate depth, but
they are not able to achieve as high of a resolution as depth-based methods due
to the large memory consumption of high-resolution feature volumes. This letter
proposes a real-time feature volume-based dense reconstruction method that
predicts TSDF (Truncated Signed Distance Function) values from a novel
sparsified deep feature volume, which is able to achieve higher resolutions
than previous feature volume-based methods, and is favorable in large-scale
outdoor scenarios where the majority of voxels are empty. An uncertainty-aware
multi-view stereo (MVS) network is leveraged to infer initial voxel locations
of the physical surface in a sparse feature volume. Then for refining the
recovered 3D geometry, deep features are attentively aggregated from multiview
images at potential surface locations, and temporally fused. Besides achieving
higher resolutions than before, our method is shown to produce more complete
reconstructions with finer detail in many cases. Extensive evaluations on both
public and self-collected datasets demonstrate a very competitive real-time
reconstruction result for our method compared to state-of-the-art
reconstruction methods in both indoor and outdoor settings.
Related papers
- HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction [37.00102816748563]
We introduce a volume encoding to explicitly encode the spatial information.
High-resolution volumes capture the high-frequency geometry details.
Low-resolution volumes enforce the spatial consistency to keep the shape smooth.
This hierarchical volume encoding could be appended to any implicit surface reconstruction method as a plug-and-play module.
arXiv Detail & Related papers (2024-08-03T06:34:20Z) - UniSDF: Unifying Neural Representations for High-Fidelity 3D
Reconstruction of Complex Scenes with Reflections [92.38975002642455]
We propose UniSDF, a general purpose 3D reconstruction method that can reconstruct large complex scenes with reflections.
Our method is able to robustly reconstruct complex large-scale scenes with fine details and reflective surfaces.
arXiv Detail & Related papers (2023-12-20T18:59:42Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - FineRecon: Depth-aware Feed-forward Network for Detailed 3D
Reconstruction [13.157400338544177]
Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry is feasible using deep neural networks.
We propose three effective solutions for improving the fidelity of inference-based 3D reconstructions.
Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.
arXiv Detail & Related papers (2023-04-04T02:50:29Z) - VolRecon: Volume Rendering of Signed Ray Distance Functions for
Generalizable Multi-View Reconstruction [64.09702079593372]
VolRecon is a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF)
On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction.
arXiv Detail & Related papers (2022-12-15T18:59:54Z) - MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface
Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views.
This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints.
Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z) - BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion [85.24673400250671]
We present Bi-level Neural Volume Fusion (BNV-Fusion), which leverages recent advances in neural implicit representations and neural rendering for dense 3D reconstruction.
In order to incrementally integrate new depth maps into a global neural implicit representation, we propose a novel bi-level fusion strategy.
We evaluate the proposed method on multiple datasets quantitatively and qualitatively, demonstrating a significant improvement over existing methods.
arXiv Detail & Related papers (2022-04-03T19:33:09Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.