ZeroStereo: Zero-shot Stereo Matching from Single Images
- URL: http://arxiv.org/abs/2501.08654v2
- Date: Sat, 08 Mar 2025 09:29:56 GMT
- Title: ZeroStereo: Zero-shot Stereo Matching from Single Images
- Authors: Xianqi Wang, Hao Yang, Gangwei Xu, Junda Cheng, Min Lin, Yong Deng, Jinliang Zang, Yurui Chen, Xin Yang,
- Abstract summary: We propose ZeroStereo, a novel stereo image generation pipeline for zero-shot stereo matching.<n>Our approach synthesizes high-quality right images by leveraging pseudo disparities generated by a monocular depth estimation model.<n>Our pipeline achieves state-of-the-art zero-shot generalization across multiple datasets with only a dataset volume comparable to Scene Flow.
- Score: 17.560148513475387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art supervised stereo matching methods have achieved remarkable performance on various benchmarks. However, their generalization to real-world scenarios remains challenging due to the scarcity of annotated real-world stereo data. In this paper, we propose ZeroStereo, a novel stereo image generation pipeline for zero-shot stereo matching. Our approach synthesizes high-quality right images from arbitrary single images by leveraging pseudo disparities generated by a monocular depth estimation model. Unlike previous methods that address occluded regions by filling missing areas with neighboring pixels or random backgrounds, we fine-tune a diffusion inpainting model to recover missing details while preserving semantic structure. Additionally, we propose Training-Free Confidence Generation, which mitigates the impact of unreliable pseudo labels without additional training, and Adaptive Disparity Selection, which ensures a diverse and realistic disparity distribution while preventing excessive occlusion and foreground distortion. Experiments demonstrate that models trained with our pipeline achieve state-of-the-art zero-shot generalization across multiple datasets with only a dataset volume comparable to Scene Flow. Code: https://github.com/Windsrain/ZeroStereo.
Related papers
- Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes [34.19578921335553]
Reconstructing 3D scenes from a single image is a fundamentally ill-posed task due to the severely under-constrained nature of the problem.
In this work, we address these inherent limitations in existing single image-to-3D scene feedforward networks.
To alleviate the poor performance due to insufficient information beyond the input image's view, we leverage a strong generative prior in the form of a pre-trained latent video diffusion model.
arXiv Detail & Related papers (2025-03-19T23:14:27Z) - FoundationStereo: Zero-Shot Stereo Matching [50.79202911274819]
FoundationStereo is a foundation model for stereo depth estimation.
We first construct a large-scale (1M stereo pairs) synthetic training dataset.
We then design a number of network architecture components to enhance scalability.
arXiv Detail & Related papers (2025-01-17T01:01:44Z) - Pseudo-Stereo Inputs: A Solution to the Occlusion Challenge in Self-Supervised Stereo Matching [0.0]
Self-supervised stereo matching holds great promise for application and research.
Direct self-supervised stereo matching paradigms based on photometric loss functions have consistently struggled with performance issues.
We propose a simple yet highly effective pseudo-stereo inputs strategy to address the core occlusion challenge.
arXiv Detail & Related papers (2024-10-03T14:40:17Z) - Stereo Risk: A Continuous Modeling Approach to Stereo Matching [110.22344879336043]
We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision.
We demonstrate that Stereo Risk enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions.
A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets.
arXiv Detail & Related papers (2024-07-03T14:30:47Z) - StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models [2.9260206957981167]
We introduce StereoDiffusion, a method that is trainning free, remarkably straightforward to use, and seamlessly integrates into the original Stable Diffusion model.
Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs.
Our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
arXiv Detail & Related papers (2024-03-08T00:30:25Z) - Improving Diffusion-Based Image Synthesis with Context Prediction [49.186366441954846]
Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes.
We propose ConPreDiff to improve diffusion-based image synthesis with context prediction.
Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.
arXiv Detail & Related papers (2024-01-04T01:10:56Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Gradpaint: Gradient-Guided Inpainting with Diffusion Models [71.47496445507862]
Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation.
We present GradPaint, which steers the generation towards a globally coherent image.
We generalizes well to diffusion models trained on various datasets, improving upon current state-of-the-art supervised and unsupervised methods.
arXiv Detail & Related papers (2023-09-18T09:36:24Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering [55.70938412352287]
We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation.
The proposed approach minimizes potential reconstruction inconsistency that happens due to insufficient viewpoints.
We achieve consistently improved performance compared to existing neural view synthesis methods by large margins on multiple standard benchmarks.
arXiv Detail & Related papers (2021-12-31T11:56:01Z) - Data Generation using Texture Co-occurrence and Spatial Self-Similarity
for Debiasing [6.976822832216875]
We propose a novel de-biasing approach that explicitly generates additional images using texture representations of oppositely labeled images.
Every new generated image contains similar spatial information from a source image while transferring textures from a target image of opposite label.
Our model integrates a texture co-occurrence loss that determines whether a generated image's texture is similar to that of the target, and a spatial self-similarity loss that determines whether the spatial details between the generated and source images are well preserved.
arXiv Detail & Related papers (2021-10-15T08:04:59Z) - Low-Light Image Enhancement with Normalizing Flow [92.52290821418778]
In this paper, we investigate to model this one-to-many relationship via a proposed normalizing flow model.
An invertible network that takes the low-light images/features as the condition and learns to map the distribution of normally exposed images into a Gaussian distribution.
The experimental results on the existing benchmark datasets show our method achieves better quantitative and qualitative results, obtaining better-exposed illumination, less noise and artifact, and richer colors.
arXiv Detail & Related papers (2021-09-13T12:45:08Z) - Image Inpainting Using Wasserstein Generative Adversarial Imputation
Network [0.0]
This paper introduces an image inpainting model based on Wasserstein Generative Adversarial Imputation Network.
A universal imputation model is able to handle various scenarios of missingness with sufficient quality.
arXiv Detail & Related papers (2021-06-23T05:55:07Z) - Diverse Single Image Generation with Controllable Global Structure
though Self-Attention [1.2522889958051286]
We show how to generate images that require global context using generative adversarial networks.
Our results are visually better than the state-of-the-art particularly in generating images that require global context.
The diversity of our image generation, measured using the average standard deviation of pixels, is also better.
arXiv Detail & Related papers (2021-02-09T11:52:48Z) - Unsupervised Image Restoration Using Partially Linear Denoisers [2.3061446605472558]
We propose a class of structured denoisers that can be decomposed as the sum of a nonlinear image-dependent mapping, a linear noise-dependent term and a small residual term.
We show that these denoisers can be trained with only noisy images under the condition that the noise has zero mean and known variance.
Our method outperforms some recent unsupervised and self-supervised deep denoising models that do not require clean images for their training.
arXiv Detail & Related papers (2020-08-14T02:13:19Z) - Fully Unsupervised Diversity Denoising with Convolutional Variational
Autoencoders [81.30960319178725]
We propose DivNoising, a denoising approach based on fully convolutional variational autoencoders (VAEs)
First we introduce a principled way of formulating the unsupervised denoising problem within the VAE framework by explicitly incorporating imaging noise models into the decoder.
We show that such a noise model can either be measured, bootstrapped from noisy data, or co-learned during training.
arXiv Detail & Related papers (2020-06-10T21:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.