StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space
- URL: http://arxiv.org/abs/2512.10959v1
- Date: Thu, 11 Dec 2025 18:59:59 GMT
- Title: StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space
- Authors: Tjark Behrens, Anton Obukhov, Bingxin Ke, Fabio Tosi, Matteo Poggi, Konrad Schindler,
- Abstract summary: We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis.<n>A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end.
- Score: 55.40440023281068
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp & inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.
Related papers
- Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z) - DANCE: Density-agnostic and Class-aware Network for Point Cloud Completion [1.7188280334580195]
Point cloud completion aims to recover missing geometric structures from incomplete 3D scans.<n>DANCE is a novel framework that completes only the missing regions while preserving the observed geometry.
arXiv Detail & Related papers (2025-11-11T08:45:06Z) - OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z) - BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment [31.118114556998048]
We introduce a unified framework that bridges monocular and stereo approaches to 3D estimation.<n>A novel cross-attentive alignment mechanism dynamically synchronizes monocular contextual cues with stereo hypothesis representations.<n>Our approach enables robust 3D perception that transcends modality-specific limitations.
arXiv Detail & Related papers (2025-08-06T16:31:22Z) - Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z) - Harmonizing Geometry and Uncertainty: Diffusion with Hyperspheres [43.20744744438439]
We introduce HyperSphereDiff to align hyperspherical structures with directional noise, preserving class geometry and effectively capturing angular uncertainty.<n>We evaluate our framework on four object datasets and two face datasets, showing that incorporating angular uncertainty better preserves the underlying hyperspherical manifold.
arXiv Detail & Related papers (2025-06-12T11:10:52Z) - Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based
View Synthesis [70.40950409274312]
We modify density fields to encourage them to converge towards surfaces, without compromising their ability to reconstruct thin structures.
We also develop a fusion-based meshing strategy followed by mesh simplification and appearance model fitting.
The compact meshes produced by our model can be rendered in real-time on mobile devices.
arXiv Detail & Related papers (2024-02-19T18:59:41Z) - Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion [45.171150395915056]
3D semantic scene completion (SSC) is an ill-posed perception task that requires inferring a dense 3D scene from limited observations.
Previous camera-based methods struggle to predict accurate semantic scenes due to inherent geometric ambiguity and incomplete observations.
We resort to stereo matching technique and bird's-eye-view (BEV) representation learning to address such issues in SSC.
arXiv Detail & Related papers (2023-03-24T12:33:44Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z) - Analyzing the Latent Space of GAN through Local Dimension Estimation [4.688163910878411]
style-based GANs (StyleGANs) in high-fidelity image synthesis have motivated research to understand the semantic properties of their latent spaces.
We propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model.
Our proposed metric, called Distortion, measures an inconsistency of intrinsic space on the learned latent space.
arXiv Detail & Related papers (2022-05-26T06:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.