Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
- URL: http://arxiv.org/abs/2412.04472v1
- Date: Thu, 05 Dec 2024 18:59:58 GMT
- Title: Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
- Authors: Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia,
- Abstract summary: We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs)
We show that our synthetic-only trained model achieves state-of-the-art results in zero-shot generalization, significantly outperforming existing solutions.
- Score: 37.90622613373521
- License:
- Abstract: We introduce Stereo Anywhere, a novel stereo-matching framework that combines geometric constraints with robust priors from monocular depth Vision Foundation Models (VFMs). By elegantly coupling these complementary worlds through a dual-branch architecture, we seamlessly integrate stereo matching with learned contextual cues. Following this design, our framework introduces novel cost volume fusion mechanisms that effectively handle critical challenges such as textureless regions, occlusions, and non-Lambertian surfaces. Through our novel optical illusion dataset, MonoTrap, and extensive evaluation across multiple benchmarks, we demonstrate that our synthetic-only trained model achieves state-of-the-art results in zero-shot generalization, significantly outperforming existing solutions while showing remarkable robustness to challenging cases such as mirrors and transparencies.
Related papers
- FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.
We first construct a large-scale (1M stereo pairs) synthetic training dataset.
We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z) - DEFOM-Stereo: Depth Foundation Model Based Stereo Matching [12.22373236061929]
DEFOM-Stereo is built to facilitate robust stereo matching with monocular depth cues.
DEFOM-Stereo is verified to have comparable performance on the Scene Flow dataset with state-of-the-art (SOTA) methods.
Our model simultaneously outperforms previous models on the individual benchmarks.
arXiv Detail & Related papers (2025-01-16T10:59:29Z) - Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data [26.029499450825092]
We introduce StereoAnything, a solution for robust stereo matching.
We scale up the dataset by collecting labeled stereo images and generating synthetic stereo pairs from unlabeled monocular images.
We extensively evaluate the zero-shot capabilities of our model on five public datasets.
arXiv Detail & Related papers (2024-11-21T11:59:04Z) - Generalizable Novel-View Synthesis using a Stereo Camera [21.548844864282994]
We propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images.
We introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction.
Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis.
arXiv Detail & Related papers (2024-04-21T05:39:44Z) - Multi-scale Alternated Attention Transformer for Generalized Stereo
Matching [7.493797166406228]
We present a simple but highly effective network called Alternated Attention U-shaped Transformer (AAUformer) to balance the impact of epipolar line in dual and single view.
Compared to other models, our model has several main designs.
We performed a series of both comparative studies and ablation studies on several mainstream stereo matching datasets.
arXiv Detail & Related papers (2023-08-06T08:22:39Z) - Single-View View Synthesis with Self-Rectified Pseudo-Stereo [49.946151180828465]
We leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint.
We propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner.
Our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.
arXiv Detail & Related papers (2023-04-19T09:36:13Z) - SGM3D: Stereo Guided Monocular 3D Object Detection [62.11858392862551]
We propose a stereo-guided monocular 3D object detection network, termed SGM3D.
We exploit robust 3D features extracted from stereo images to enhance the features learned from the monocular image.
Our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
arXiv Detail & Related papers (2021-12-03T13:57:14Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.