3DVNet: Multi-View Depth Prediction and Volumetric Refinement
- URL: http://arxiv.org/abs/2112.00202v1
- Date: Wed, 1 Dec 2021 00:52:42 GMT
- Title: 3DVNet: Multi-View Depth Prediction and Volumetric Refinement
- Authors: Alexander Rich, Noah Stier, Pradeep Sen, Tobias H\"ollerer
- Abstract summary: 3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
- Score: 68.68537312256144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method
that combines the advantages of previous depth-based and volumetric MVS
approaches. Our key idea is the use of a 3D scene-modeling network that
iteratively updates a set of coarse depth predictions, resulting in highly
accurate predictions which agree on the underlying scene geometry. Unlike
existing depth-prediction techniques, our method uses a volumetric 3D
convolutional neural network (CNN) that operates in world space on all depth
maps jointly. The network can therefore learn meaningful scene-level priors.
Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on
a feature-augmented point cloud, allowing for effective aggregation of
multi-view information and flexible iterative refinement of depth maps.
Experimental results show our method exceeds state-of-the-art accuracy in both
depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well
as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows
that our method is both effective and generalizes to new settings.
Related papers
- Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth
Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error.
We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface.
Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z) - SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep
Multi-View Stereo [13.535871843518953]
We present a real-time visual-inertial dense mapping method with high quality using only monocular images and IMU readings.
We propose a sparse point aided stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system.
Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset.
arXiv Detail & Related papers (2023-06-14T17:28:45Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Self-supervised Depth Estimation Leveraging Global Perception and
Geometric Smoothness Using On-board Videos [0.5276232626689566]
We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features.
A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map.
In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-07T10:53:27Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - DELTAS: Depth Estimation by Learning Triangulation And densification of
Sparse points [14.254472131009653]
Multi-view stereo (MVS) is the golden mean between the accuracy of active depth sensing and the practicality of monocular depth estimation.
Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems.
We propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally (c) densifying this sparse set of 3D points using CNNs.
arXiv Detail & Related papers (2020-03-19T17:56:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.