Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume
Excitation
- URL: http://arxiv.org/abs/2108.05773v1
- Date: Thu, 12 Aug 2021 14:32:26 GMT
- Title: Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume
Excitation
- Authors: Antyanta Bangunharcana, Jae Won Cho, Seokju Lee, In So Kweon,
Kyung-Soo Kim, Soohyun Kim
- Abstract summary: We construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably.
We present an end-to-end network that we call Correlate-and-Excite (CoEx)
- Score: 65.83008812026635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Volumetric deep learning approach towards stereo matching aggregates a cost
volume computed from input left and right images using 3D convolutions. Recent
works showed that utilization of extracted image features and a spatially
varying cost volume aggregation complements 3D convolutions. However, existing
methods with spatially varying operations are complex, cost considerable
computation time, and cause memory consumption to increase. In this work, we
construct Guided Cost volume Excitation (GCE) and show that simple channel
excitation of cost volume guided by image can improve performance considerably.
Moreover, we propose a novel method of using top-k selection prior to
soft-argmin disparity regression for computing the final disparity estimate.
Combining our novel contributions, we present an end-to-end network that we
call Correlate-and-Excite (CoEx). Extensive experiments of our model on the
SceneFlow, KITTI 2012, and KITTI 2015 datasets demonstrate the effectiveness
and efficiency of our model and show that our model outperforms other
speed-based algorithms while also being competitive to other state-of-the-art
algorithms. Codes will be made available at https://github.com/antabangun/coex.
Related papers
- LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation [27.00836175513738]
LightStereo is a cutting-edge stereo-matching network crafted to accelerate the matching process.
Our breakthrough lies in enhancing performance through a dedicated focus on the channel dimension of the 3D cost volume.
LightStereo achieves a competitive EPE metric in the SceneFlow datasets while demanding a minimum of only 22 GFLOPs and 17 ms of runtime.
arXiv Detail & Related papers (2024-06-28T11:11:24Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Curvature-guided dynamic scale networks for Multi-view Stereo [10.667165962654996]
This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation.
We present a dynamic scale feature extraction network, namely, CDSFNet.
It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface.
arXiv Detail & Related papers (2021-12-11T14:41:05Z) - Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection.
scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z) - CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching [27.313740022587442]
We propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network.
We employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space.
Our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020.
arXiv Detail & Related papers (2021-04-09T11:38:59Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Content-Aware Inter-Scale Cost Aggregation for Stereo Matching [42.02981855948903]
Our method achieves reliable detail recovery when upsampling through the aggregation of information across different scales.
A novel decomposition strategy is proposed to efficiently construct the 3D filter weights and aggregate the 3D cost volume.
Experiment results on Scene Flow dataset, KITTI2015 and Middlebury demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-06-05T02:38:34Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z) - CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks [87.02416370081123]
3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene understanding, such as video analysis and volumetric image recognition.
We propose Channel-wise Automatic KErnel Shrinking (CAKES), to enable efficient 3D learning by shrinking standard 3D convolutions into a set of economic operations.
arXiv Detail & Related papers (2020-03-28T14:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.