AANet: Adaptive Aggregation Network for Efficient Stereo Matching
- URL: http://arxiv.org/abs/2004.09548v1
- Date: Mon, 20 Apr 2020 18:07:55 GMT
- Title: AANet: Adaptive Aggregation Network for Efficient Stereo Matching
- Authors: Haofei Xu, Juyong Zhang
- Abstract summary: Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
- Score: 33.39794232337985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable progress made by learning based stereo matching
algorithms, one key challenge remains unsolved. Current state-of-the-art stereo
models are mostly based on costly 3D convolutions, the cubic computational
complexity and high memory consumption make it quite expensive to deploy in
real-world applications. In this paper, we aim at completely replacing the
commonly used 3D convolutions to achieve fast inference speed while maintaining
comparable accuracy. To this end, we first propose a sparse points based
intra-scale cost aggregation method to alleviate the well-known edge-fattening
issue at disparity discontinuities. Further, we approximate traditional
cross-scale cost aggregation algorithm with neural network layers to handle
large textureless regions. Both modules are simple, lightweight, and
complementary, leading to an effective and efficient architecture for cost
aggregation. With these two modules, we can not only significantly speed up
existing top-performing models (e.g., $41\times$ than GC-Net, $4\times$ than
PSMNet and $38\times$ than GA-Net), but also improve the performance of fast
stereo models (e.g., StereoNet). We also achieve competitive results on Scene
Flow and KITTI datasets while running at 62ms, demonstrating the versatility
and high efficiency of the proposed method. Our full framework is available at
https://github.com/haofeixu/aanet .
Related papers
- LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation [27.00836175513738]
LightStereo is a cutting-edge stereo-matching network crafted to accelerate the matching process.
Our breakthrough lies in enhancing performance through a dedicated focus on the channel dimension of the 3D cost volume.
LightStereo achieves a competitive EPE metric in the SceneFlow datasets while demanding a minimum of only 22 GFLOPs and 17 ms of runtime.
arXiv Detail & Related papers (2024-06-28T11:11:24Z) - Fully $1\times1$ Convolutional Network for Lightweight Image
Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more)
$1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations.
We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z) - Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching [13.76996108304056]
This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap.
We use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit.
Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches.
arXiv Detail & Related papers (2021-10-25T09:54:17Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume
Excitation [65.83008812026635]
We construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably.
We present an end-to-end network that we call Correlate-and-Excite (CoEx)
arXiv Detail & Related papers (2021-08-12T14:32:26Z) - ES-Net: An Efficient Stereo Matching Network [4.8986598953553555]
Existing stereo matching networks typically use slow and computationally expensive 3D convolutions to improve the performance.
We propose the Efficient Stereo Network (ESNet), which achieves high performance and efficient inference at the same time.
arXiv Detail & Related papers (2021-03-05T20:11:39Z) - Multi-Scale Cost Volumes Cascade Network for Stereo Matching [9.440848600106797]
We propose MSCVNet, which combines traditional methods and CNN to improve the quality of cost volume.
Our network achieves a big improvement in accuracy, demonstrating the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-03T08:40:17Z) - Bilateral Grid Learning for Stereo Matching Networks [22.92443311789097]
We present a novel edge-preserving cost volume upsampling module based on the slicing operation in the learned bilateral grid.
The slicing layer is parameter-free, which allows us to obtain a high quality cost volume of high resolution.
We design a real-time network based on this module, which outperforms existing published real-time deep stereo matching networks.
arXiv Detail & Related papers (2021-01-01T09:08:01Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency
Checking [54.58791377183574]
Our novel hybrid recurrent multi-view stereo net consists of two core modules: 1) a light DRENet (Dense Reception Expanded) module to extract dense feature maps of original size with multi-scale context information, 2) a HU-LSTM (Hybrid U-LSTM) to regularize 3D matching volume into predicted depth map.
Our method exhibits competitive performance to the state-of-the-art method while dramatically reduces memory consumption, which costs only $19.4%$ of R-MVSNet memory consumption.
arXiv Detail & Related papers (2020-07-21T14:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.