Related papers: $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

URL: http://arxiv.org/abs/2507.13229v3
Date: Wed, 30 Jul 2025 16:27:21 GMT
Title: $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
Authors: Junhong Min, Youngpil Jeon, Jimin Kim, Minyong Choi,
Abstract summary: Generalizable stereo matching model is capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning.<n>Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism limits the global consistency required for true generalization.<n>We develop a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks.
Score: 0.47676805869864924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The pursuit of a generalizable stereo matching model, capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. However, global matching architectures, while theoretically more robust, have historically been rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with $S^2M^2$: a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. $S^2M^2$ establishes a new state of the art on Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods in most metrics while reconstructing high-quality details with competitive efficiency.

Related papers

NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z)
Factorized Implicit Global Convolution for Automotive Computational Fluid Dynamics Prediction [52.32698071488864]
We propose Factorized Implicit Global Convolution (FIGConv), a novel architecture that efficiently solves CFD problems for very large 3D meshes.<n>FIGConv achieves quadratic complexity $O(N2)$, a significant improvement over existing 3D neural CFD models.<n>We validate our approach on the industry-standard Ahmed body dataset and the large-scale DrivAerNet dataset.
arXiv Detail & Related papers (2025-02-06T18:57:57Z)
Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights [0.8233872344445676]
In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. We propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. Our method does not compromise accuracy, with an increase in inference accuracy of up to $sim 1%$ and a reduction in RMSE of $17.17%$ in various benchmark datasets.
arXiv Detail & Related papers (2024-05-07T22:54:17Z)
Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank. Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
DeepHAM: A Global Solution Method for Heterogeneous Agent Models with Aggregate Shocks [9.088303226909277]
We propose an efficient, reliable, and interpretable global solution method, $textitDeep learning-based algorithm for Heterogeneous Agent Models, DeepHAM$.
arXiv Detail & Related papers (2021-12-29T03:09:19Z)
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z)
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching [27.313740022587442]
We propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network. We employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space. Our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020.
arXiv Detail & Related papers (2021-04-09T11:38:59Z)
SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures. Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities. We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z)
AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions. We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue. We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z)
Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.