Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking
- URL: http://arxiv.org/abs/2007.10872v1
- Date: Tue, 21 Jul 2020 14:59:59 GMT
- Title: Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking
- Authors: Jianfeng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang,
Yisong Chen, Guoping Wang, Yu-Wing Tai
- Abstract summary: Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
- Score: 54.58791377183574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an efficient and effective dense hybrid recurrent
multi-view stereo net with dynamic consistency checking, namely
$D^{2}$HC-RMVSNet, for accurate dense point cloud reconstruction. Our novel
hybrid recurrent multi-view stereo net consists of two core modules: 1) a light
DRENet (Dense Reception Expanded) module to extract dense feature maps of
original size with multi-scale context information, 2) a HU-LSTM (Hybrid
U-LSTM) to regularize 3D matching volume into predicted depth map, which
efficiently aggregates different scale information by coupling LSTM and U-Net
architecture. To further improve the accuracy and completeness of reconstructed
point clouds, we leverage a dynamic consistency checking strategy instead of
prefixed parameters and strategies widely adopted in existing methods for dense
point cloud reconstruction. In doing so, we dynamically aggregate geometric
consistency matching error among all the views. Our method ranks
\textbf{$1^{st}$} on the complex outdoor \textsl{Tanks and Temples} benchmark
over all the methods. Extensive experiments on the in-door DTU dataset show our
method exhibits competitive performance to the state-of-the-art method while
dramatically reduces memory consumption, which costs only $19.4\%$ of R-MVSNet
memory consumption. The codebase is available at
\hyperlink{https://github.com/yhw-yhw/D2HC-RMVSNet}{https://github.com/yhw-yhw/D2HC-RMVSNet}.
Related papers
- CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation [88.50067783122559]
CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF)
Experiments show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set.
arXiv Detail & Related papers (2023-07-16T16:49:59Z) - Dynamic Clustering Transformer Network for Point Cloud Segmentation [23.149220817575195]
We propose a novel 3D point cloud representation network, called Dynamic Clustering Transformer Network (DCTNet)
It has an encoder-decoder architecture, allowing for both local and global feature learning.
Our method was evaluated on an object-based dataset (ShapeNet), an urban navigation dataset (Toronto-3D), and a multispectral LiDAR dataset.
arXiv Detail & Related papers (2023-05-30T01:11:05Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - Non-local Recurrent Regularization Networks for Multi-view Stereo [108.17325696835542]
In deep multi-view stereo networks, cost regularization is crucial to achieve accurate depth estimation.
We propose a novel non-local recurrent regularization network for multi-view stereo, named NR2-Net.
Our method achieves state-of-the-art reconstruction results on both DTU and Tanks and Temples datasets.
arXiv Detail & Related papers (2021-10-13T01:43:54Z) - AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet.
We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation.
We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z) - 3D Point Cloud Registration with Multi-Scale Architecture and
Self-supervised Fine-tuning [5.629161809575013]
MS-SVConv is a fast multi-scale deep neural network that outputs features from point clouds for 3D registration between two scenes.
We show significant improvements compared to state-of-the-art methods on the competitive and well-known 3DMatch benchmark.
We present a strategy to fine-tune MS-SVConv on unknown datasets in a self-supervised way, which leads to state-of-the-art results on ETH and TUM datasets.
arXiv Detail & Related papers (2021-03-26T15:38:33Z) - Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth
Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation.
We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation
and Gauss-Newton Refinement [46.8514966956438]
This paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS.
Specifically, in our Fast-MVSNet, we first construct a sparse cost volume for learning a sparse and high-resolution depth map.
At last, a simple but efficient Gauss-Newton layer is proposed to further optimize the depth map.
arXiv Detail & Related papers (2020-03-29T13:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.