Generalized Geometry Encoding Volume for Real-time Stereo Matching
- URL: http://arxiv.org/abs/2512.06793v1
- Date: Sun, 07 Dec 2025 11:12:50 GMT
- Title: Generalized Geometry Encoding Volume for Real-time Stereo Matching
- Authors: Jiaxin Liu, Gangwei Xu, Xianqi Wang, Chengliang Zhang, Xin Yang,
- Abstract summary: Generalized Geometry Volume (GGEV) is a novel real-time stereo matching network that achieves strong generalization.<n>We show that GGEV surpasses all existing real-time methods in zero-shot generalization capability.
- Score: 18.857989746328155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time stereo matching methods primarily focus on enhancing in-domain performance but often overlook the critical importance of generalization in real-world applications. In contrast, recent stereo foundation models leverage monocular foundation models (MFMs) to improve generalization, but typically suffer from substantial inference latency. To address this trade-off, we propose Generalized Geometry Encoding Volume (GGEV), a novel real-time stereo matching network that achieves strong generalization. We first extract depth-aware features that encode domain-invariant structural priors as guidance for cost aggregation. Subsequently, we introduce a Depth-aware Dynamic Cost Aggregation (DDCA) module that adaptively incorporates these priors into each disparity hypothesis, effectively enhancing fragile matching relationships in unseen scenes. Both steps are lightweight and complementary, leading to the construction of a generalized geometry encoding volume with strong generalization capability. Experimental results demonstrate that our GGEV surpasses all existing real-time methods in zero-shot generalization capability, and achieves state-of-the-art performance on the KITTI 2012, KITTI 2015, and ETH3D benchmarks.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - Toward Generalizable Deblurring: Leveraging Massive Blur Priors with Linear Attention for Real-World Scenarios [9.82847623835017]
GLOWDeblur is a Generalizable reaL-wOrld lightWeight Deblur model that combines convolution-based pre-reconstruction & domain alignment module with a lightweight diffusion backbone.<n>We propose Blur Pattern Pretraining (BPP), which acquires blur priors from simulation datasets and transfers them through joint fine-tuning on real data.<n>We further introduce Motion and Semantic Guidance (MoSeG) to strengthen blur priors under severe degradation, and integrate it into GLOWDeblur, a Generalizable reaL-wOrld lightWeight Deblur model that combines convolution-based pre-reconstruction &
arXiv Detail & Related papers (2026-01-10T11:01:31Z) - Enhancing Generalization of Depth Estimation Foundation Model via Weakly-Supervised Adaptation with Regularization [21.788680301776207]
We propose WeSTAR, a parameter-efficient framework that performs Weakly supervised Self-Training Adaptation with Regularization.<n>We first adopt a dense self-training objective as the primary source of structural self-supervision.<n>To further improve robustness, we introduce semantically-aware hierarchical normalization.
arXiv Detail & Related papers (2025-11-18T08:16:16Z) - Deepfake Detection that Generalizes Across Benchmarks [48.85953407706351]
The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment.<n>This work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders.<n>The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC.
arXiv Detail & Related papers (2025-08-08T12:03:56Z) - FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.<n>We first construct a large-scale (1M stereo pairs) synthetic training dataset.<n>We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z) - Explaining the role of Intrinsic Dimensionality in Adversarial Training [31.495803865226158]
We show that off-manifold adversarial examples (AEs) enhance robustness, while on-manifold AEs improve generalization.<n>We introduce SMAAT, which improves the scalability of AT for encoder-based models by perturbing the layer with the lowest intrinsic dimensionality.<n>We validate SMAAT across multiple tasks, including text generation, sentiment classification, safety filtering, and retrieval augmented generation setups.
arXiv Detail & Related papers (2024-05-27T12:48:30Z) - RobustMVS: Single Domain Generalized Deep Multi-view Stereo [27.92012008096311]
This work focuses on the domain generalization problem in Multi-view Stereo (MVS)
We build a novel MVS domain generalization benchmark including synthetic and real-world datasets.
In contrast to conventional domain generalization benchmarks, we consider a more realistic but challenging scenario, where only one source domain is available for training.
arXiv Detail & Related papers (2024-05-15T06:56:05Z) - Strong but simple: A Baseline for Domain Generalized Dense Perception by CLIP-based Transfer Learning [6.532114018212791]
Fine-tuning vision-language pre-trained models yields competitive or even stronger generalization results.<n>This challenges the standard of using ImageNet-based transfer learning for domain generalization.<n>We also find improved in-domain generalization, leading to an improved SOTA of 86.4% mIoU on the Cityscapes test set.
arXiv Detail & Related papers (2023-12-04T16:46:38Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Depth Field Networks for Generalizable Multi-view Scene Representation [31.090289865520475]
We learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity.
Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide margin.
arXiv Detail & Related papers (2022-07-28T17:59:31Z) - Evaluating the Generalization Ability of Super-Resolution Networks [45.867729539843]
We propose a Generalization Assessment Index for SR networks, namely SRGA.
SRGA exploits the statistical characteristics of the internal features of deep networks to measure the generalization ability.
We benchmark existing SR models on the generalization ability.
arXiv Detail & Related papers (2022-05-14T09:33:20Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data.
We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG)
Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z) - Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.