InfiniteNature-Zero: Learning Perpetual View Generation of Natural
Scenes from Single Images
- URL: http://arxiv.org/abs/2207.11148v1
- Date: Fri, 22 Jul 2022 15:41:06 GMT
- Title: InfiniteNature-Zero: Learning Perpetual View Generation of Natural
Scenes from Single Images
- Authors: Zhengqi Li, Qianqian Wang, Noah Snavely, Angjoo Kanazawa
- Abstract summary: We present a method for learning to generate flythrough videos of natural scenes starting from a single view.
This capability is learned from a collection of single photographs, without requiring camera poses or even multiple views of each scene.
- Score: 83.37640073416749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method for learning to generate unbounded flythrough videos of
natural scenes starting from a single view, where this capability is learned
from a collection of single photographs, without requiring camera poses or even
multiple views of each scene. To achieve this, we propose a novel
self-supervised view generation training paradigm, where we sample and
rendering virtual camera trajectories, including cyclic ones, allowing our
model to learn stable view generation from a collection of single views. At
test time, despite never seeing a video during training, our approach can take
a single image and generate long camera trajectories comprised of hundreds of
new views with realistic and diverse content. We compare our approach with
recent state-of-the-art supervised view generation methods that require posed
multi-view videos and demonstrate superior performance and synthesis quality.
Related papers
- Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos [66.1935609072708]
Key hypothesis is that the more accurately an individual view can predict a view-agnostic text summary, the more informative it is.
We propose a framework that uses the relative accuracy of view-dependent caption predictions as a proxy for best view pseudo-labels.
During inference, our model takes as input only a multi-view video -- no language or camera poses -- and returns the best viewpoint to watch at each timestep.
arXiv Detail & Related papers (2024-11-13T16:31:08Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline.
Our model does not require depth as input, and does not explicitly model 3D scene geometry.
We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z) - FSViewFusion: Few-Shots View Generation of Novel Objects [75.81872204650807]
We introduce a pretrained stable diffusion model for view synthesis without explicit 3D priors.
Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots.
We establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.
arXiv Detail & Related papers (2024-03-11T02:59:30Z) - PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis [23.967904337714234]
We propose a set-based generative model that can simultaneously generate multiple, self-consistent new views.
Our approach is not limited to generating a single image at a time and can condition on a variable number of views.
We show that the model is capable of generating sets of views that have no natural ordering, like loops and binocular trajectories, and significantly outperforms other methods on such tasks.
arXiv Detail & Related papers (2024-02-28T02:06:11Z) - Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects.
Our framework is a non-trivial adaptation from image generation methods, and is new to this field.
Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z) - Long-Term Photometric Consistent Novel View Synthesis with Diffusion
Models [24.301334966272297]
We propose a novel generative model capable of producing a sequence of photorealistic images consistent with a specified camera trajectory.
To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED)
arXiv Detail & Related papers (2023-04-21T02:01:02Z) - Infinite Nature: Perpetual View Generation of Natural Scenes from a
Single Image [73.56631858393148]
We introduce the problem of perpetual view generation -- long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image.
We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework.
Our approach can be trained from a set of monocular video sequences without any manual annotation.
arXiv Detail & Related papers (2020-12-17T18:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.