Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
- URL: http://arxiv.org/abs/2112.02338v1
- Date: Sat, 4 Dec 2021 13:57:18 GMT
- Title: Generalized Binary Search Network for Highly-Efficient Multi-View Stereo
- Authors: Zhenxing Mi, Di Chang, Dan Xu
- Abstract summary: Multi-view Stereo (MVS) with known camera parameters is essentially a 1D search problem within a valid depth range.
Recent deep learning-based MVS methods typically densely sample depth hypotheses in the depth range.
We propose a novel method for highly efficient MVS that remarkably decreases the memory footprint.
- Score: 10.367295443948487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view Stereo (MVS) with known camera parameters is essentially a 1D
search problem within a valid depth range. Recent deep learning-based MVS
methods typically densely sample depth hypotheses in the depth range, and then
construct prohibitively memory-consuming 3D cost volumes for depth prediction.
Although coarse-to-fine sampling strategies alleviate this overhead issue to a
certain extent, the efficiency of MVS is still an open challenge. In this work,
we propose a novel method for highly efficient MVS that remarkably decreases
the memory footprint, meanwhile clearly advancing state-of-the-art depth
prediction performance. We investigate what a search strategy can be reasonably
optimal for MVS taking into account of both efficiency and effectiveness. We
first formulate MVS as a binary search problem, and accordingly propose a
generalized binary search network for MVS. Specifically, in each step, the
depth range is split into 2 bins with extra 1 error tolerance bin on both
sides. A classification is performed to identify which bin contains the true
depth. We also design three mechanisms to respectively handle classification
errors, deal with out-of-range samples and decrease the training memory. The
new formulation makes our method only sample a very small number of depth
hypotheses in each step, which is highly memory efficient, and also greatly
facilitates quick training convergence. Experiments on competitive benchmarks
show that our method achieves state-of-the-art accuracy with much less memory.
Particularly, our method obtains an overall score of 0.289 on DTU dataset and
tops the first place on challenging Tanks and Temples advanced dataset among
all the learning-based methods. The trained models and code will be released at
https://github.com/MiZhenxing/GBi-Net.
Related papers
- NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - IterMVS: Iterative Probability Estimation for Efficient Multi-View
Stereo [71.84742490020611]
IterMVS is a new data-driven method for high-resolution multi-view stereo.
We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state.
We verify the efficiency and effectiveness of our method on DTU, Tanks&Temples and ETH3D.
arXiv Detail & Related papers (2021-12-09T18:58:02Z) - PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility [23.427619869594437]
We propose an end-to-end trainable PatchMatch-based MVS approach that combines advantages of trainable costs and regularizations with pixelwise estimates.
We evaluate our method on widely used MVS benchmarks, ETH3D and Tanks and Temples (TnT)
arXiv Detail & Related papers (2021-08-19T23:14:48Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse
Coding [86.40042104698792]
We formulate neural architecture search as a sparse coding problem.
In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search.
Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
arXiv Detail & Related papers (2020-10-13T04:34:24Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation
and Gauss-Newton Refinement [46.8514966956438]
This paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS.
Specifically, in our Fast-MVSNet, we first construct a sparse cost volume for learning a sparse and high-resolution depth map.
At last, a simple but efficient Gauss-Newton layer is proposed to further optimize the depth map.
arXiv Detail & Related papers (2020-03-29T13:31:00Z) - DELTAS: Depth Estimation by Learning Triangulation And densification of
Sparse points [14.254472131009653]
Multi-view stereo (MVS) is the golden mean between the accuracy of active depth sensing and the practicality of monocular depth estimation.
Cost volume based approaches employing 3D convolutional neural networks (CNNs) have considerably improved the accuracy of MVS systems.
We propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally (c) densifying this sparse set of 3D points using CNNs.
arXiv Detail & Related papers (2020-03-19T17:56:41Z) - 3DSSD: Point-based 3D Single Stage Object Detector [61.67928229961813]
We present a point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency.
Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well.
arXiv Detail & Related papers (2020-02-24T12:01:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.