Related papers: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

URL: http://arxiv.org/abs/2512.10959v1
Date: Thu, 11 Dec 2025 18:59:59 GMT
Title: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space
Authors: Tjark Behrens, Anton Obukhov, Bingxin Ke, Fabio Tosi, Matteo Poggi, Konrad Schindler,
Abstract summary: We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis.<n>A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end.
Score: 55.40440023281068
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.

Related papers

Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z)
DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion [1.7188280334580195]
Point cloud completion aims to recover missing geometric structures from incomplete 3D scans.<n>DANCE is a novel framework that completes only the missing regions while preserving the observed geometry.
arXiv Detail & Related papers (2025-11-11T08:45:06Z)
OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z)
BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment [31.118114556998048]
We introduce a unified framework that bridges monocular and stereo approaches to 3D estimation.<n>A novel cross-attentive alignment mechanism dynamically synchronizes monocular contextual cues with stereo hypothesis representations.<n>Our approach enables robust 3D perception that transcends modality-specific limitations.
arXiv Detail & Related papers (2025-08-06T16:31:22Z)
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z)
Harmonizing Geometry and Uncertainty: Diffusion with Hyperspheres [43.20744744438439]
We introduce HyperSphereDiff to align hyperspherical structures with directional noise, preserving class geometry and effectively capturing angular uncertainty.<n>We evaluate our framework on four object datasets and two face datasets, showing that incorporating angular uncertainty better preserves the underlying hyperspherical manifold.
arXiv Detail & Related papers (2025-06-12T11:10:52Z)
Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based View Synthesis [70.40950409274312]
We modify density fields to encourage them to converge towards surfaces, without compromising their ability to reconstruct thin structures. We also develop a fusion-based meshing strategy followed by mesh simplification and appearance model fitting. The compact meshes produced by our model can be rendered in real-time on mobile devices.
arXiv Detail & Related papers (2024-02-19T18:59:41Z)
Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion [45.171150395915056]
3D semantic scene completion (SSC) is an ill-posed perception task that requires inferring a dense 3D scene from limited observations. Previous camera-based methods struggle to predict accurate semantic scenes due to inherent geometric ambiguity and incomplete observations. We resort to stereo matching technique and bird's-eye-view (BEV) representation learning to address such issues in SSC.
arXiv Detail & Related papers (2023-03-24T12:33:44Z)
On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper. We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z)
Analyzing the Latent Space of GAN through Local Dimension Estimation [4.688163910878411]
style-based GANs (StyleGANs) in high-fidelity image synthesis have motivated research to understand the semantic properties of their latent spaces. We propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model. Our proposed metric, called Distortion, measures an inconsistency of intrinsic space on the learned latent space.
arXiv Detail & Related papers (2022-05-26T06:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.