SRH-Net: Stacked Recurrent Hourglass Network for Stereo Matching
- URL: http://arxiv.org/abs/2105.11587v1
- Date: Tue, 25 May 2021 00:10:56 GMT
- Title: SRH-Net: Stacked Recurrent Hourglass Network for Stereo Matching
- Authors: Hongzhi Du, Yanyan Li, Yanbiao Sun, Jigui Zhu and Federico Tombari
- Abstract summary: We decouple the 4D cubic cost volume used by 3D convolutional filters into sequential cost maps along the direction of disparity.
A novel recurrent module, Stacked Recurrent Hourglass (SRH), is proposed to process each cost map.
The proposed architecture is implemented in an end-to-end pipeline and evaluated on public datasets.
- Score: 33.66537830990198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The cost aggregation strategy shows a crucial role in learning-based stereo
matching tasks, where 3D convolutional filters obtain state of the art but
require intensive computation resources, while 2D operations need less GPU
memory but are sensitive to domain shift. In this paper, we decouple the 4D
cubic cost volume used by 3D convolutional filters into sequential cost maps
along the direction of disparity instead of dealing with it at once by
exploiting a recurrent cost aggregation strategy. Furthermore, a novel
recurrent module, Stacked Recurrent Hourglass (SRH), is proposed to process
each cost map. Our hourglass network is constructed based on Gated Recurrent
Units (GRUs) and down/upsampling layers, which provides GRUs larger receptive
fields. Then two hourglass networks are stacked together, while multi-scale
information is processed by skip connections to enhance the performance of the
pipeline in textureless areas. The proposed architecture is implemented in an
end-to-end pipeline and evaluated on public datasets, which reduces GPU memory
consumption by up to 56.1\% compared with PSMNet using stacked hourglass 3D
CNNs without the degradation of accuracy. Then, we further demonstrate the
scalability of the proposed method on several high-resolution pairs, while
previously learned approaches often fail due to the memory constraint. The code
is released at \url{https://github.com/hongzhidu/SRHNet}.
Related papers
- ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Spatial Pruned Sparse Convolution for Efficient 3D Object Detection [41.62839541489369]
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
In this paper, we analyze major components of existing 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead.
We propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS
arXiv Detail & Related papers (2022-09-28T16:19:06Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Non-local Recurrent Regularization Networks for Multi-view Stereo [108.17325696835542]
In deep multi-view stereo networks, cost regularization is crucial to achieve accurate depth estimation.
We propose a novel non-local recurrent regularization network for multi-view stereo, named NR2-Net.
Our method achieves state-of-the-art reconstruction results on both DTU and Tanks and Temples datasets.
arXiv Detail & Related papers (2021-10-13T01:43:54Z) - Invertible Residual Network with Regularization for Effective Medical
Image Segmentation [2.76240219662896]
Invertible neural networks have been applied to significantly reduce activation memory footprint when training neural networks with backpropagation.
We propose two versions of the invertible Residual Network, namely Partially Invertible Residual Network (Partially-InvRes) and Fully Invertible Residual Network (Fully-InvRes)
Our results indicate that by using partially/fully invertible networks as the central workhorse in volumetric segmentation, we not only reduce memory overhead but also achieve compatible segmentation performance compared against the non-invertible 3D Unet.
arXiv Detail & Related papers (2021-03-16T13:19:59Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z) - RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference [24.351577383531616]
We introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs)
An RNNPool layer can effectively replace multiple blocks in a variety of architectures like MobileNets, DenseNet when applied to standard vision tasks like image classification and face detection.
We use RNNPool with the standard S3FD architecture to construct a face detection method that achieves state-of-the-art MAP for tiny ARM Cortex-M4 class microcontrollers with under 256 KB of RAM.
arXiv Detail & Related papers (2020-02-27T05:22:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.