Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data
- URL: http://arxiv.org/abs/2411.14053v1
- Date: Thu, 21 Nov 2024 11:59:04 GMT
- Title: Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data
- Authors: Xianda Guo, Chenming Zhang, Youmin Zhang, Dujun Nie, Ruilin Wang, Wenzhao Zheng, Matteo Poggi, Long Chen,
- Abstract summary: We introduce StereoAnything, a solution for robust stereo matching.
We scale up the dataset by collecting labeled stereo images and generating synthetic stereo pairs from unlabeled monocular images.
We extensively evaluate the zero-shot capabilities of our model on five public datasets.
- Score: 26.029499450825092
- License:
- Abstract: Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo matching. Rather than focusing on a specialized model, our goal is to develop a versatile foundational model capable of handling stereo images across diverse environments. To this end, we scale up the dataset by collecting labeled stereo images and generating synthetic stereo pairs from unlabeled monocular images. To further enrich the model's ability to generalize across different conditions, we introduce a novel synthetic dataset that complements existing data by adding variability in baselines, camera angles, and scene types. We extensively evaluate the zero-shot capabilities of our model on five public datasets, showcasing its impressive ability to generalize to new, unseen data. Code will be available at \url{https://github.com/XiandaGuo/OpenStereo}.
Related papers
- Generalizable Novel-View Synthesis using a Stereo Camera [21.548844864282994]
We propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images.
We introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction.
Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis.
arXiv Detail & Related papers (2024-04-21T05:39:44Z) - Playing to Vision Foundation Model's Strengths in Stereo Matching [13.887661472501618]
This study serves as the first exploration of a viable approach for adapting vision foundation models (VFMs) to stereo matching.
Our ViT adapter, referred to as ViTAS, is constructed upon three types of modules: spatial differentiation, patch attention fusion, and cross-attention.
ViTAStereo outperforms the second-best network StereoBase by approximately 7.9% in terms of the percentage of error pixels, with a tolerance of 3 pixels.
arXiv Detail & Related papers (2024-04-09T12:34:28Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline [25.4712469033627]
We develop a flexible and efficient stereo matching, called OpenStereo.
OpenStereo includes training and inference codes of more than 10 network models.
We conduct an exhaustive analysis and deconstruction of recent developments in stereo matching through comprehensive ablative experiments.
Our StereoBase ranks 1st on SceneFlow, KITTI 2015, 2012 (Reflective) among published methods and achieves the best performance across all metrics.
arXiv Detail & Related papers (2023-12-01T04:35:47Z) - DynamicStereo: Consistent Dynamic Depth from Stereo Videos [91.1804971397608]
We propose DynamicStereo to estimate disparity for stereo videos.
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
arXiv Detail & Related papers (2023-05-03T17:40:49Z) - Single-View View Synthesis with Self-Rectified Pseudo-Stereo [49.946151180828465]
We leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint.
We propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner.
Our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.
arXiv Detail & Related papers (2023-04-19T09:36:13Z) - Differentiable Stereopsis: Meshes from multiple views using
differentiable rendering [72.25348629612782]
We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.
We pair traditional stereopsis and modern differentiable rendering to build an end-to-end model which predicts textured 3D meshes of objects with varying topologies and shape.
arXiv Detail & Related papers (2021-10-11T17:59:40Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo
Matching [14.603116313499648]
We propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo.
Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution.
We publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes.
arXiv Detail & Related papers (2021-03-12T05:27:14Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z) - Expanding Sparse Guidance for Stereo Matching [24.74333370941674]
We propose a novel sparsity expansion technique to expand the sparse cues concerning RGB images for local feature enhancement.
Our approach significantly boosts the existing state-of-the-art stereo algorithms with extremely sparse cues.
arXiv Detail & Related papers (2020-04-24T06:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.