GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching
- URL: http://arxiv.org/abs/2503.12720v1
- Date: Mon, 17 Mar 2025 01:19:28 GMT
- Title: GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching
- Authors: Feng Qiao, Zhexiao Xiong, Eric Xing, Nathan Jacobs,
- Abstract summary: GenStereo is a diffusion-based approach to stereo image generation.<n>It achieves both visual quality for viewing and geometric accuracy for matching.<n>Our framework eliminates the need for complex hardware setups while enabling high-quality stereo image generation.
- Score: 9.322869042942504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stereo images are fundamental to numerous applications, including extended reality (XR) devices, autonomous driving, and robotics. Unfortunately, acquiring high-quality stereo images remains challenging due to the precise calibration requirements of dual-camera setups and the complexity of obtaining accurate, dense disparity maps. Existing stereo image generation methods typically focus on either visual quality for viewing or geometric accuracy for matching, but not both. We introduce GenStereo, a diffusion-based approach, to bridge this gap. The method includes two primary innovations (1) conditioning the diffusion process on a disparity-aware coordinate embedding and a warped input image, allowing for more precise stereo alignment than previous methods, and (2) an adaptive fusion mechanism that intelligently combines the diffusion-generated image with a warped image, improving both realism and disparity consistency. Through extensive training on 11 diverse stereo datasets, GenStereo demonstrates strong generalization ability. GenStereo achieves state-of-the-art performance in both stereo image generation and unsupervised stereo matching tasks. Our framework eliminates the need for complex hardware setups while enabling high-quality stereo image generation, making it valuable for both real-world applications and unsupervised learning scenarios. Project page is available at https://qjizhi.github.io/genstereo
Related papers
- Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion [88.67015254278859]
We introduce the Mono2Stereo dataset, providing high-quality training data and benchmark to support in-depth exploration of stereo conversion.
We conduct an empirical study that yields two primary findings. 1) The differences between the left and right views are subtle, yet existing metrics consider overall pixels, failing to concentrate on regions critical to stereo effects.
We introduce a new evaluation metric, Stereo Intersection-over-Union, which harmonizes disparity and achieves a high correlation with human judgments on stereo effect.
arXiv Detail & Related papers (2025-03-28T09:25:58Z) - ZeroStereo: Zero-shot Stereo Matching from Single Images [17.560148513475387]
We propose ZeroStereo, a novel stereo image generation pipeline for zero-shot stereo matching.<n>Our approach synthesizes high-quality right images by leveraging pseudo disparities generated by a monocular depth estimation model.<n>Our pipeline achieves state-of-the-art zero-shot generalization across multiple datasets with only a dataset volume comparable to Scene Flow.
arXiv Detail & Related papers (2025-01-15T08:43:48Z) - Single-View View Synthesis with Self-Rectified Pseudo-Stereo [49.946151180828465]
We leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint.
We propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner.
Our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.
arXiv Detail & Related papers (2023-04-19T09:36:13Z) - Stereo Image Rain Removal via Dual-View Mutual Attention [55.79448042969012]
We propose a new underlineStereo underlineImage underlineRain underlineRemoval method (StereoIRR) via sufficient interaction between two views.
We show that StereoIRR outperforms other related monocular and stereo image rain removal methods on several datasets.
arXiv Detail & Related papers (2022-11-18T09:07:01Z) - Self-Supervised Intensity-Event Stereo Matching [24.851819610561517]
Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy.
Event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously.
This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors.
arXiv Detail & Related papers (2022-11-01T14:52:25Z) - T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency
and Manifold Mix-Up [16.165889084870116]
We present an end-to-end approach to generate high-resolution person images conditioned on texts only.
We develop an effective generative model to produce person images with two novel mechanisms.
arXiv Detail & Related papers (2022-08-18T07:41:02Z) - Revisiting Domain Generalized Stereo Matching Networks from a Feature
Consistency Perspective [65.37571681370096]
We propose a simple pixel-wise contrastive learning across the viewpoints.
A stereo selective whitening loss is introduced to better preserve the stereo feature consistency across domains.
Our method achieves superior performance over several state-of-the-art networks.
arXiv Detail & Related papers (2022-03-21T11:21:41Z) - Co-Teaching: An Ark to Unsupervised Stereo Matching [14.801038005597855]
CoT-Stereo is a novel unsupervised stereo matching approach.
Experiments on the KITTI Stereo benchmarks demonstrate the superior performance of CoT-Stereo.
arXiv Detail & Related papers (2021-07-17T05:33:39Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z) - Parallax Attention for Unsupervised Stereo Correspondence Learning [46.035892564279564]
Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images.
Recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities.
We propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations.
arXiv Detail & Related papers (2020-09-16T01:30:13Z) - Reversing the cycle: self-supervised deep stereo through enhanced
monocular distillation [51.714092199995044]
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches.
We propose a novel self-supervised paradigm reversing the link between the two.
In order to train deep stereo networks, we distill knowledge through a monocular completion network.
arXiv Detail & Related papers (2020-08-17T07:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.