Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene
Video from A Single Image
- URL: http://arxiv.org/abs/2203.09457v1
- Date: Thu, 17 Mar 2022 17:16:16 GMT
- Title: Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene
Video from A Single Image
- Authors: Xuanchi Ren, Xiaolong Wang
- Abstract summary: We propose a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions.
Our method outperforms state-of-the-art view synthesis approaches by a large margin.
- Score: 8.13564646389987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel view synthesis from a single image has recently attracted a lot of
attention, and it has been primarily advanced by 3D deep learning and rendering
techniques. However, most work is still limited by synthesizing new views
within relatively small camera motions. In this paper, we propose a novel
approach to synthesize a consistent long-term video given a single scene image
and a trajectory of large camera motions. Our approach utilizes an
autoregressive Transformer to perform sequential modeling of multiple frames,
which reasons the relations between multiple frames and the corresponding
cameras to predict the next frame. To facilitate learning and ensure
consistency among generated frames, we introduce a locality constraint based on
the input cameras to guide self-attention among a large number of patches
across space and time. Our method outperforms state-of-the-art view synthesis
approaches by a large margin, especially when synthesizing long-term future in
indoor 3D scenes. Project page at https://xrenaa.github.io/look-outside-room/.
Related papers
- Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline.
Our model does not require depth as input, and does not explicitly model 3D scene geometry.
We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z) - Explorative Inbetweening of Time and Space [46.77750028273578]
We introduce bounded generation to control video generation based only on a given start and end frame.
Time Reversal Fusion fuses the temporally forward and backward denoising paths conditioned on the start and end frame.
We find that Time Reversal Fusion outperforms related work on all subtasks.
arXiv Detail & Related papers (2024-03-21T17:57:31Z) - COLMAP-Free 3D Gaussian Splatting [88.420322646756]
We propose a novel method to perform novel view synthesis without any SfM preprocessing.
We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time.
Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.
arXiv Detail & Related papers (2023-12-12T18:39:52Z) - ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models [33.760292331843104]
Generating novel views of an object from a single image is a challenging task.
Recent methods for view synthesis based on diffusion have shown great progress.
We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
arXiv Detail & Related papers (2023-12-03T06:50:15Z) - Persistent Nature: A Generative Model of Unbounded 3D Worlds [74.51149070418002]
We present an extendable, planar scene layout grid that can be rendered from arbitrary camera poses via a 3D decoder and volume rendering.
Based on this representation, we learn a generative world model solely from single-view internet photos.
Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation.
arXiv Detail & Related papers (2023-03-23T17:59:40Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Infinite Nature: Perpetual View Generation of Natural Scenes from a
Single Image [73.56631858393148]
We introduce the problem of perpetual view generation -- long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image.
We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework.
Our approach can be trained from a set of monocular video sequences without any manual annotation.
arXiv Detail & Related papers (2020-12-17T18:59:57Z) - Street-view Panoramic Video Synthesis from a Single Satellite Image [92.26826861266784]
We present a novel method for synthesizing both temporally and geometrically consistent street-view panoramic video.
Existing cross-view synthesis approaches focus more on images, while video synthesis in such a case has not yet received enough attention.
arXiv Detail & Related papers (2020-12-11T20:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.