Lite Any Stereo: Efficient Zero-Shot Stereo Matching
- URL: http://arxiv.org/abs/2511.16555v1
- Date: Thu, 20 Nov 2025 17:07:06 GMT
- Title: Lite Any Stereo: Efficient Zero-Shot Stereo Matching
- Authors: Junpeng Jing, Weixun Luo, Ye Mao, Krystian Mikolajczyk,
- Abstract summary: Lite Any Stereo is a framework that achieves strong zero-shot generalization while remaining highly efficient.<n>Our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods.
- Score: 21.89511226115265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable of zero-shot ability due to their limited capacity. In this paper, we introduce Lite Any Stereo, a stereo depth estimation framework that achieves strong zero-shot generalization while remaining highly efficient. To this end, we design a compact yet expressive backbone to ensure scalability, along with a carefully crafted hybrid cost aggregation module. We further propose a three-stage training strategy on million-scale data to effectively bridge the sim-to-real gap. Together, these components demonstrate that an ultra-light model can deliver strong generalization, ranking 1st across four widely used real-world benchmarks. Remarkably, our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods while requiring less than 1% computational cost, setting a new standard for efficient stereo matching.
Related papers
- Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching [16.927491376135134]
We present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate.<n>We employ a divide-and-conquer acceleration strategy with three components: knowledge distillation, blockwise neural architecture search and structured pruning.<n>The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy.
arXiv Detail & Related papers (2025-12-11T21:36:29Z) - Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [70.67610495024459]
Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps.<n>Existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments.<n>We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation.
arXiv Detail & Related papers (2025-03-30T16:24:22Z) - FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.<n>We first construct a large-scale (1M stereo pairs) synthetic training dataset.<n>We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z) - Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail [37.90622613373521]
We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs)<n>We show that our synthetic-only trained model achieves state-of-the-art results in zero-shot generalization, significantly outperforming existing solutions.
arXiv Detail & Related papers (2024-12-05T18:59:58Z) - Stereo Anything: Unifying Zero-shot Stereo Matching with Large-Scale Mixed Data [77.27700893908012]
Stereo matching serves as a cornerstone in 3D vision, aiming to establish pixel-wise correspondences between stereo image pairs for depth recovery.<n>Current models often exhibit severe performance degradation when deployed in unseen domains.<n>We introduce StereoAnything, a data-centric framework that substantially enhances the zero-shot generalization capability of existing stereo models.
arXiv Detail & Related papers (2024-11-21T11:59:04Z) - Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices [5.696239274365031]
We propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy.
We obtained a model that maintains real-time performance while delivering high accuracy on edge devices.
arXiv Detail & Related papers (2024-05-20T06:03:55Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - Expanding Sparse Guidance for Stereo Matching [24.74333370941674]
We propose a novel sparsity expansion technique to expand the sparse cues concerning RGB images for local feature enhancement.
Our approach significantly boosts the existing state-of-the-art stereo algorithms with extremely sparse cues.
arXiv Detail & Related papers (2020-04-24T06:41:11Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.