$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
- URL: http://arxiv.org/abs/2507.13229v3
- Date: Wed, 30 Jul 2025 16:27:21 GMT
- Title: $S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
- Authors: Junhong Min, Youngpil Jeon, Jimin Kim, Minyong Choi,
- Abstract summary: Generalizable stereo matching model is capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning.<n>Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism limits the global consistency required for true generalization.<n>We develop a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks.
- Score: 0.47676805869864924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pursuit of a generalizable stereo matching model, capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. However, global matching architectures, while theoretically more robust, have historically been rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with $S^2M^2$: a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. $S^2M^2$ establishes a new state of the art on Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods in most metrics while reconstructing high-quality details with competitive efficiency.
Related papers
- Generative Data Transformation: From Mixed to Unified Data [57.84692191369066]
textscTaesar is a emphdata-centric framework for textbftarget-textbfal textbfregeneration.<n>It encodes cross-domain context into target sequences, enabling standard models to learn intricate dependencies without complex fusion architectures.
arXiv Detail & Related papers (2026-02-26T08:30:09Z) - CC-OR-Net: A Unified Framework for LTV Prediction through Structural Decoupling [15.714075484024177]
CC-OR-Net is a novel unified framework that achieves a more robust decoupling through textbfstructural decomposition.<n> CC-OR-Net integrates three specialized components: a textitstructural ordinal decomposition module for robust ranking, an textitintra-bucket residual module for fine-grained regression, and a textittargeted high-value augmentation module for precision on top-tier users.
arXiv Detail & Related papers (2026-01-15T08:35:17Z) - RoMa v2: Harder Better Faster Denser Feature Matching [56.71494120301684]
Dense feature matching aims to estimate all correspondences between two images of a 3D scene.<n>Existing dense matchers fail or perform poorly for many hard real-world scenarios.<n>In this paper, we attack these weaknesses on a wide front through a series of systematic improvements.
arXiv Detail & Related papers (2025-11-19T18:59:38Z) - DE3S: Dual-Enhanced Soft-Sparse-Shape Learning for Medical Early Time-Series Classification [11.539700200482853]
ETSC is critical in time-sensitive medical applications such as sepsis.<n>It presents an inherent trade-off between accuracy and earliness.<n>We propose textbfDE3S, a framework to overcome these underlying challenges.
arXiv Detail & Related papers (2025-10-14T07:10:05Z) - H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction [39.22287224290769]
H3R is a hybrid framework that integrates latent fusion with attention-based feature aggregation.<n>By integrating both paradigms, our approach enhances generalization while converging 2$times$ faster than existing methods.<n>Our method supports variable-number and high-resolution input views while demonstrating robust cross-dataset generalization.
arXiv Detail & Related papers (2025-08-05T05:56:30Z) - NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z) - Factorized Implicit Global Convolution for Automotive Computational Fluid Dynamics Prediction [52.32698071488864]
We propose Factorized Implicit Global Convolution (FIGConv), a novel architecture that efficiently solves CFD problems for very large 3D meshes.<n>FIGConv achieves quadratic complexity $O(N2)$, a significant improvement over existing 3D neural CFD models.<n>We validate our approach on the industry-standard Ahmed body dataset and the large-scale DrivAerNet dataset.
arXiv Detail & Related papers (2025-02-06T18:57:57Z) - TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation [50.23504065567638]
This paper introduces textbfTD3, a novel textbfDataset textbfDistillation method within a meta-learning framework.<n> TD3 distills a fully expressive emphsynthetic sequence summary from original data.<n>An augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the emphouter-loop.
arXiv Detail & Related papers (2025-02-05T03:13:25Z) - Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights [0.8233872344445676]
In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety.
We propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices.
Our method does not compromise accuracy, with an increase in inference accuracy of up to $sim 1%$ and a reduction in RMSE of $17.17%$ in various benchmark datasets.
arXiv Detail & Related papers (2024-05-07T22:54:17Z) - PatchFusion: An End-to-End Tile-Based Framework for High-Resolution
Monocular Metric Depth Estimation [47.53810786827547]
Single image depth estimation is a foundational task in computer vision and generative modeling.
We present PatchFusion, a novel tile-based framework with three key components to improve the current state of the art.
Experiments on UnrealStereo4K, MVS- Synth, and Middleburry 2014 demonstrate that our framework can generate high-resolution depth maps with intricate details.
arXiv Detail & Related papers (2023-12-04T19:03:12Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - DeepHAM: A Global Solution Method for Heterogeneous Agent Models with
Aggregate Shocks [9.088303226909277]
We propose an efficient, reliable, and interpretable global solution method, $textitDeep learning-based algorithm for Heterogeneous Agent Models, DeepHAM$.
arXiv Detail & Related papers (2021-12-29T03:09:19Z) - Generalizable Mixed-Precision Quantization via Attribution Rank
Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference.
Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z) - CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching [27.313740022587442]
We propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network.
We employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space.
Our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020.
arXiv Detail & Related papers (2021-04-09T11:38:59Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Discretization-Aware Architecture Search [81.35557425784026]
This paper presents discretization-aware architecture search (DAtextsuperscript2S)
The core idea is to push the super-network towards the configuration of desired topology, so that the accuracy loss brought by discretization is largely alleviated.
Experiments on standard image classification benchmarks demonstrate the superiority of our approach.
arXiv Detail & Related papers (2020-07-07T01:18:58Z) - AANet: Adaptive Aggregation Network for Efficient Stereo Matching [33.39794232337985]
Current state-of-the-art stereo models are mostly based on costly 3D convolutions.
We propose a sparse points based intra-scale cost aggregation method to alleviate the edge-fattening issue.
We also approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions.
arXiv Detail & Related papers (2020-04-20T18:07:55Z) - Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by
Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images)
This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.