LeanStereo: A Leaner Backbone based Stereo Network
- URL: http://arxiv.org/abs/2503.18557v1
- Date: Mon, 24 Mar 2025 11:10:52 GMT
- Title: LeanStereo: A Leaner Backbone based Stereo Network
- Authors: Rafia Rahim, Samuel Woerz, Andreas Zell,
- Abstract summary: We propose a fast end-to-end stereo matching method using learned attention weights based cost volume combined with LogL1 loss.<n>We show that our method requires 4x less operations and is also about 9 to 14x faster compared to the state of the art methods.
- Score: 10.824879437909306
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, end-to-end deep networks based stereo matching methods, mainly because of their performance, have gained popularity. However, this improvement in performance comes at the cost of increased computational and memory bandwidth requirements, thus necessitating specialized hardware (GPUs); even then, these methods have large inference times compared to classical methods. This limits their applicability in real-world applications. Although we desire high accuracy stereo methods albeit with reasonable inference time. To this end, we propose a fast end-to-end stereo matching method. Majority of this speedup comes from integrating a leaner backbone. To recover the performance lost because of a leaner backbone, we propose to use learned attention weights based cost volume combined with LogL1 loss for stereo matching. Using LogL1 loss not only improves the overall performance of the proposed network but also leads to faster convergence. We do a detailed empirical evaluation of different design choices and show that our method requires 4x less operations and is also about 9 to 14x faster compared to the state of the art methods like ACVNet [1], LEAStereo [2] and CFNet [3] while giving comparable performance.
Related papers
- Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching [16.927491376135134]
We present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate.<n>We employ a divide-and-conquer acceleration strategy with three components: knowledge distillation, blockwise neural architecture search and structured pruning.<n>The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy.
arXiv Detail & Related papers (2025-12-11T21:36:29Z) - LightStereo: Channel Boost Is All You Need for Efficient 2D Cost Aggregation [27.00836175513738]
LightStereo is a cutting-edge stereo-matching network crafted to accelerate the matching process.
Our breakthrough lies in enhancing performance through a dedicated focus on the channel dimension of the 3D cost volume.
LightStereo achieves a competitive EPE metric in the SceneFlow datasets while demanding a minimum of only 22 GFLOPs and 17 ms of runtime.
arXiv Detail & Related papers (2024-06-28T11:11:24Z) - FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side
Information [44.88123177525665]
Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications.
We propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder.
Our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.
arXiv Detail & Related papers (2023-12-28T11:12:03Z) - ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z) - Efficient Diffusion Training via Min-SNR Weighting Strategy [78.5801305960993]
We treat the diffusion training as a multi-task learning problem and introduce a simple yet effective approach referred to as Min-SNR-$gamma$.
Our results demonstrate a significant improvement in converging speed, 3.4$times$ faster than previous weighting strategies.
It is also more effective, achieving a new record FID score of 2.06 on the ImageNet $256times256$ benchmark using smaller architectures than that employed in previous state-of-the-art.
arXiv Detail & Related papers (2023-03-16T17:59:56Z) - Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities.
Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z) - Multi-scale Iterative Residuals for Fast and Scalable Stereo Matching [13.76996108304056]
This paper presents an iterative multi-scale coarse-to-fine refinement (iCFR) framework to bridge this gap.
We use multi-scale warped features to estimate disparity residuals and push the disparity search range in the cost volume to a minimum limit.
Finally, we apply a refinement network to recover the loss of precision which is inherent in multi-scale approaches.
arXiv Detail & Related papers (2021-10-25T09:54:17Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - ES-Net: An Efficient Stereo Matching Network [4.8986598953553555]
Existing stereo matching networks typically use slow and computationally expensive 3D convolutions to improve the performance.
We propose the Efficient Stereo Network (ESNet), which achieves high performance and efficient inference at the same time.
arXiv Detail & Related papers (2021-03-05T20:11:39Z) - PatchmatchNet: Learned Multi-View Patchmatch Stereo [70.14789588576438]
We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo.
With high speed and low memory requirement, PatchmatchNet can process higher resolution imagery and is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization.
arXiv Detail & Related papers (2020-12-02T18:59:02Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.