Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
- URL: http://arxiv.org/abs/2512.11130v1
- Date: Thu, 11 Dec 2025 21:36:29 GMT
- Title: Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
- Authors: Bowen Wen, Shaurya Dewan, Stan Birchfield,
- Abstract summary: We present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate.<n>We employ a divide-and-conquer acceleration strategy with three components: knowledge distillation, blockwise neural architecture search and structured pruning.<n>The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy.
- Score: 16.927491376135134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stereo foundation models achieve strong zero-shot generalization but remain computationally prohibitive for real-time applications. Efficient stereo architectures, on the other hand, sacrifice robustness for speed and require costly per-domain fine-tuning. To bridge this gap, we present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate. We employ a divide-and-conquer acceleration strategy with three components: (1) knowledge distillation to compress the hybrid backbone into a single efficient student; (2) blockwise neural architecture search for automatically discovering optimal cost filtering designs under latency budgets, reducing search complexity exponentially; and (3) structured pruning for eliminating redundancy in the iterative refinement module. Furthermore, we introduce an automatic pseudo-labeling pipeline used to curate 1.4M in-the-wild stereo pairs to supplement synthetic training data and facilitate knowledge distillation. The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy, thus establishing a new state-of-the-art among real-time methods. Project page: https://nvlabs.github.io/Fast-FoundationStereo/
Related papers
- Transform-Invariant Generative Ray Path Sampling for Efficient Radio Propagation Modeling [5.567300300236187]
Ray tracing has become a standard for accurate radio propagation modeling, but suffers from computational complexity.<n>We propose a comprehensive machine-learning-assisted framework that replaces exhaustive path searching with intelligent sampling via Generative Flow Networks.
arXiv Detail & Related papers (2026-03-02T09:37:34Z) - Le-DETR: Revisiting Real-Time Detection Transformer with Efficient Encoder Design [72.55935017828891]
We present Le-DETR (textbfLow-cost and textbfEfficient textbfDEtection textbfTRansformer)<n>It achieves a new textbfSOTA in real-time detection using only ImageNet1K and COCO 2017 training datasets.<n>It surpasses YOLOv12-L/X by textbf+0.6/-0.1 mAP while achieving similar speed and textbf+20% speedup.
arXiv Detail & Related papers (2026-02-24T15:29:55Z) - Fast-SAM3D: 3Dfy Anything in Images but Faster [65.17322167628367]
SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency.<n>We present textbfFast-SAM3D, a training-free framework that aligns computation with instantaneous generation complexity.
arXiv Detail & Related papers (2026-02-05T04:27:59Z) - Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation [12.384836052394272]
Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's synthesis.<n>We propose a unified AR-diffusion framework Fast-ARDiff that jointly optimize both components.<n>Fast-ARDiff achieves state-of-the-art acceleration across diverse models.
arXiv Detail & Related papers (2025-12-09T12:35:18Z) - Lite Any Stereo: Efficient Zero-Shot Stereo Matching [21.89511226115265]
Lite Any Stereo is a framework that achieves strong zero-shot generalization while remaining highly efficient.<n>Our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods.
arXiv Detail & Related papers (2025-11-20T17:07:06Z) - ResidualViT for Efficient Temporally Dense Video Encoding [66.57779133786131]
We make three contributions to reduce the cost of computing features for temporally dense tasks.<n>First, we introduce a vision transformer (ViT) architecture, dubbed ResidualViT, that leverages the large temporal redundancy in videos.<n>Second, we propose a lightweight distillation strategy to approximate the frame-level features of the original foundation model.
arXiv Detail & Related papers (2025-09-16T17:12:23Z) - AutoHFormer: Efficient Hierarchical Autoregressive Transformer for Time Series Prediction [36.239648954658534]
Time series forecasting requires architectures that simultaneously achieve three competing objectives.<n>We introduce AutoHFormer, a hierarchical autoregressive transformer that addresses these challenges.<n> Comprehensive experiments demonstrate that AutoHFormer 10.76X faster training and 6.06X memory reduction compared to PatchTST on P08.
arXiv Detail & Related papers (2025-06-19T03:47:04Z) - ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - LeanStereo: A Leaner Backbone based Stereo Network [10.824879437909306]
We propose a fast end-to-end stereo matching method using learned attention weights based cost volume combined with LogL1 loss.<n>We show that our method requires 4x less operations and is also about 9 to 14x faster compared to the state of the art methods.
arXiv Detail & Related papers (2025-03-24T11:10:52Z) - FNAS: Uncertainty-Aware Fast Neural Architecture Search [54.49650267859032]
Reinforcement learning (RL)-based neural architecture search (NAS) generally guarantees better convergence yet suffers from the requirement of huge computational resources.
We propose a general pipeline to accelerate the convergence of the rollout process as well as the RL process in NAS.
Experiments on the Mobile Neural Architecture Search (MNAS) search space show the proposed Fast Neural Architecture Search (FNAS) accelerates standard RL-based NAS process by 10x.
arXiv Detail & Related papers (2021-05-25T06:32:52Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.