VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
- URL: http://arxiv.org/abs/2108.08623v1
- Date: Thu, 19 Aug 2021 11:33:58 GMT
- Title: VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction
- Authors: Jaesung Choe, Sunghoon Im, Francois Rameau, Minjun Kang, In So Kweon
- Abstract summary: In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
- Score: 71.83308989022635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reconstruct a 3D scene from a set of calibrated views, traditional
multi-view stereo techniques rely on two distinct stages: local depth maps
computation and global depth maps fusion. Recent studies concentrate on deep
neural architectures for depth estimation by using conventional depth fusion
method or direct 3D reconstruction network by regressing Truncated Signed
Distance Function (TSDF). In this paper, we advocate that replicating the
traditional two stages framework with deep neural networks improves both the
interpretability and the accuracy of the results. As mentioned, our network
operates in two steps: 1) the local computation of the local depth maps with a
deep MVS technique, and, 2) the depth maps and images' features fusion to build
a single TSDF volume. In order to improve the matching performance between
images acquired from very different viewpoints (e.g., large-baseline and
rotations), we introduce a rotation-invariant 3D convolution kernel called
PosedConv. The effectiveness of the proposed architecture is underlined via a
large series of experiments conducted on the ScanNet dataset where our approach
compares favorably against both traditional and deep learning techniques.
Related papers
- GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images.
We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.