StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
- URL: http://arxiv.org/abs/2512.16915v1
- Date: Thu, 18 Dec 2025 18:59:50 GMT
- Title: StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
- Authors: Guibao Shen, Yihua Du, Wenhang Ge, Jing He, Chirui Chang, Donghao Zhou, Zhen Yang, Luozhou Wang, Xin Tao, Ying-Cong Chen,
- Abstract summary: We introduce UniStereo, the first large-scale unified dataset for stereo video conversion.<n>We propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps.<n>Experiments demonstrate that StereoPilot significantly outperforms state-of-the-art methods in both visual fidelity and computational efficiency.
- Score: 41.34827274890319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains costly and complex, while automatic Monocular-to-Stereo conversion is hindered by the limitations of the multi-stage ``Depth-Warp-Inpaint'' (DWI) pipeline. This paradigm suffers from error propagation, depth ambiguity, and format inconsistency between parallel and converged stereo configurations. To address these challenges, we introduce UniStereo, the first large-scale unified dataset for stereo video conversion, covering both stereo formats to enable fair benchmarking and robust model training. Building upon this dataset, we propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps or iterative diffusion sampling. Equipped with a learnable domain switcher and a cycle consistency loss, StereoPilot adapts seamlessly to different stereo formats and achieves improved consistency. Extensive experiments demonstrate that StereoPilot significantly outperforms state-of-the-art methods in both visual fidelity and computational efficiency. Project page: https://hit-perfect.github.io/StereoPilot/.
Related papers
- Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding [62.69753250254731]
Elastic3D is a controllable, direct end-to-end method for upgrading a conventional video to a binocular one.<n>Key to its high-quality stereo video output is a novel, guided VAE decoder.
arXiv Detail & Related papers (2025-12-16T09:46:23Z) - StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation [108.97993219426509]
StereoWorld is an end-to-end framework for high-fidelity monocular-to-stereo video generation.<n>Our framework conditions the model on the monocular video input while explicitly supervising the generation with a geometry-aware regularization.<n>To enable large-scale training and evaluation, we curate a high-definition stereo video dataset.
arXiv Detail & Related papers (2025-12-10T06:50:16Z) - Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion [88.67015254278859]
We introduce the Mono2Stereo dataset, providing high-quality training data and benchmark to support in-depth exploration of stereo conversion.<n>We conduct an empirical study that yields two primary findings. 1) The differences between the left and right views are subtle, yet existing metrics consider overall pixels, failing to concentrate on regions critical to stereo effects.<n>We introduce a new evaluation metric, Stereo Intersection-over-Union, which harmonizes disparity and achieves a high correlation with human judgments on stereo effect.
arXiv Detail & Related papers (2025-03-28T09:25:58Z) - Stereo Anything: Unifying Zero-shot Stereo Matching with Large-Scale Mixed Data [77.27700893908012]
Stereo matching serves as a cornerstone in 3D vision, aiming to establish pixel-wise correspondences between stereo image pairs for depth recovery.<n>Current models often exhibit severe performance degradation when deployed in unseen domains.<n>We introduce StereoAnything, a data-centric framework that substantially enhances the zero-shot generalization capability of existing stereo models.
arXiv Detail & Related papers (2024-11-21T11:59:04Z) - StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos [44.51044100125421]
This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience.
Our framework demonstrates significant improvements in 2D-to-3D video conversion, offering a practical solution for creating immersive content for 3D devices like Apple Vision Pro and 3D displays.
arXiv Detail & Related papers (2024-09-11T17:52:07Z) - AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching [50.06646151004375]
A novel domain-adaptive pipeline called AdaStereo aims to align multi-level representations for deep stereo matching networks.
Our AdaStereo models achieve state-of-the-art cross-domain performance on multiple stereo benchmarks, including KITTI, Middlebury, ETH3D, and DrivingStereo.
arXiv Detail & Related papers (2020-04-09T16:15:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.