Restereo: Diffusion stereo video generation and restoration
- URL: http://arxiv.org/abs/2506.06023v1
- Date: Fri, 06 Jun 2025 12:14:24 GMT
- Title: Restereo: Diffusion stereo video generation and restoration
- Authors: Xingchang Huang, Ashish Kumar Singh, Florian Dubost, Cristina Nader Vasconcelos, Sakar Khattar, Liang Shi, Christian Theobalt, Cengiz Oztireli, Gurprit Singh,
- Abstract summary: We introduce a new pipeline that not only generates stereo videos but also enhances both left-view and right-view videos consistently with a single model.<n>Our method can be fine-tuned on a relatively small synthetic stereo video datasets and applied to low-quality real-world videos.
- Score: 43.208256051997616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stereo video generation has been gaining increasing attention with recent advancements in video diffusion models. However, most existing methods focus on generating 3D stereoscopic videos from monocular 2D videos. These approaches typically assume that the input monocular video is of high quality, making the task primarily about inpainting occluded regions in the warped video while preserving disoccluded areas. In this paper, we introduce a new pipeline that not only generates stereo videos but also enhances both left-view and right-view videos consistently with a single model. Our approach achieves this by fine-tuning the model on degraded data for restoration, as well as conditioning the model on warped masks for consistent stereo generation. As a result, our method can be fine-tuned on a relatively small synthetic stereo video datasets and applied to low-quality real-world videos, performing both stereo video generation and restoration. Experiments demonstrate that our method outperforms existing approaches both qualitatively and quantitatively in stereo video generation from low-resolution inputs.
Related papers
- M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion [60.728003408015844]
We propose a novel architecture for inpainting and refinement of the warped right view obtained by depth-based reprojection of the input left view.<n>Our approach outperforms previous state-of-the-art methods, obtaining an average rank of 1.43 among the 4 compared methods in a user study.
arXiv Detail & Related papers (2025-05-22T11:58:54Z) - SpatialMe: Stereo Video Conversion Using Depth-Warping and Blend-Inpainting [20.98704347305053]
We introduce SpatialMe, a novel stereo video conversion framework based on depth-warping and blend-inpainting.<n>We conduct a high-quality real-world stereo video dataset -- StereoV1K, to alleviate the data shortage.
arXiv Detail & Related papers (2024-12-16T07:42:49Z) - StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart [44.671043951223574]
We introduce StereoCrafter-Zero, a novel framework for zero-shot stereo video generation.<n>Key innovations include a noisy restart strategy to initialize stereo-aware latent representations.<n>We show that StereoCrafter-Zero produces high-quality stereo videos with enhanced depth consistency and temporal smoothness.
arXiv Detail & Related papers (2024-11-21T16:41:55Z) - ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning [43.105154507379076]
textitImmersePro is a framework specifically designed to transform single-view videos into stereo videos.
textitImmersePro employs implicit disparity guidance, enabling the generation of stereo pairs from video sequences without the need for explicit disparity maps.
Our experiments demonstrate the effectiveness of textitImmersePro in producing high-quality stereo videos, offering significant improvements over existing methods.
arXiv Detail & Related papers (2024-09-30T22:19:32Z) - Replace Anyone in Videos [82.37852750357331]
We present the ReplaceAnyone framework, which focuses on localized human replacement and insertion featuring intricate backgrounds.<n>We formulate this task as an image-conditioned video inpainting paradigm with pose guidance, utilizing a unified end-to-end video diffusion architecture.<n>The proposed ReplaceAnyone can be seamlessly applied not only to traditional 3D-UNet base models but also to DiT-based video models such as Wan2.1.
arXiv Detail & Related papers (2024-09-30T03:27:33Z) - StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos [44.51044100125421]
This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience.
Our framework demonstrates significant improvements in 2D-to-3D video conversion, offering a practical solution for creating immersive content for 3D devices like Apple Vision Pro and 3D displays.
arXiv Detail & Related papers (2024-09-11T17:52:07Z) - IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation [136.5813547244979]
We present IDOL (unIfied Dual-mOdal Latent diffusion) for high-quality human-centric joint video-depth generation.
Our IDOL consists of two novel designs. First, to enable dual-modal generation and maximize the information exchange between video and depth generation.
Second, to ensure a precise video-depth spatial alignment, we propose a motion consistency loss that enforces consistency between the video and depth feature motion fields.
arXiv Detail & Related papers (2024-07-15T17:36:54Z) - SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix [60.48666051245761]
We propose a pose-free and training-free approach for generating 3D stereoscopic videos.
Our method warps a generated monocular video into camera views on stereoscopic baseline using estimated video depth.
We develop a disocclusion boundary re-injection scheme that further improves the quality of video inpainting.
arXiv Detail & Related papers (2024-06-29T08:33:55Z) - Single-View View Synthesis with Self-Rectified Pseudo-Stereo [49.946151180828465]
We leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint.
We propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner.
Our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.
arXiv Detail & Related papers (2023-04-19T09:36:13Z) - Learning Task-Oriented Flows to Mutually Guide Feature Alignment in
Synthesized and Real Video Denoising [137.5080784570804]
Video denoising aims at removing noise from videos to recover clean ones.
Some existing works show that optical flow can help the denoising by exploiting the additional spatial-temporal clues from nearby frames.
We propose a new multi-scale refined optical flow-guided video denoising method, which is more robust to different noise levels.
arXiv Detail & Related papers (2022-08-25T00:09:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.