SceneScape: Text-Driven Consistent Scene Generation
- URL: http://arxiv.org/abs/2302.01133v2
- Date: Tue, 30 May 2023 08:52:45 GMT
- Title: SceneScape: Text-Driven Consistent Scene Generation
- Authors: Rafail Fridman, Amit Abecasis, Yoni Kasten, Tali Dekel
- Abstract summary: We introduce a novel framework that generates such videos in an online fashion by combining a pre-trained text-to-image model with a pre-trained monocular depth prediction model.
To tackle the pivotal challenge of achieving 3D consistency, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene.
In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles.
- Score: 14.348512536556413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method for text-driven perpetual view generation -- synthesizing
long-term videos of various scenes solely, given an input text prompt
describing the scene and camera poses. We introduce a novel framework that
generates such videos in an online fashion by combining the generative power of
a pre-trained text-to-image model with the geometric priors learned by a
pre-trained monocular depth prediction model. To tackle the pivotal challenge
of achieving 3D consistency, i.e., synthesizing videos that depict
geometrically-plausible scenes, we deploy an online test-time training to
encourage the predicted depth map of the current frame to be geometrically
consistent with the synthesized scene. The depth maps are used to construct a
unified mesh representation of the scene, which is progressively constructed
along the video generation process. In contrast to previous works, which are
applicable only to limited domains, our method generates diverse scenes, such
as walkthroughs in spaceships, caves, or ice castles.
Related papers
- TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions [0.562479170374811]
This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text.
Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations.
arXiv Detail & Related papers (2025-01-02T09:21:03Z) - SceneCraft: Layout-Guided 3D Scene Generation [29.713491313796084]
SceneCraft is a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences.
Our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.
arXiv Detail & Related papers (2024-10-11T17:59:58Z) - 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation.
Our method begins by generating a reference video using the video generation model.
We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z) - Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting [75.7154104065613]
We introduce a novel depth completion model, trained via teacher distillation and self-training to learn the 3D fusion process.
We also introduce a new benchmarking scheme for scene generation methods that is based on ground truth geometry.
arXiv Detail & Related papers (2024-04-30T17:59:40Z) - Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields [29.907615852310204]
We present Text2NeRF, which is able to generate a wide range of 3D scenes purely from a text prompt.
Our method requires no additional training data but only a natural language description of the scene as the input.
arXiv Detail & Related papers (2023-05-19T10:58:04Z) - RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent
Geometry and Texture [80.0643976406225]
We propose "RoomDreamer", which leverages powerful natural language to synthesize a new room with a different style.
Our work addresses the challenge of synthesizing both geometry and texture aligned to the input scene structure and prompt simultaneously.
To validate the proposed method, real indoor scenes scanned with smartphones are used for extensive experiments.
arXiv Detail & Related papers (2023-05-18T22:57:57Z) - Non-Rigid Neural Radiance Fields: Reconstruction and Novel View
Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes.
We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z) - Infinite Nature: Perpetual View Generation of Natural Scenes from a
Single Image [73.56631858393148]
We introduce the problem of perpetual view generation -- long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image.
We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework.
Our approach can be trained from a set of monocular video sequences without any manual annotation.
arXiv Detail & Related papers (2020-12-17T18:59:57Z) - Free View Synthesis [100.86844680362196]
We present a method for novel view synthesis from input images that are freely distributed around a scene.
Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
arXiv Detail & Related papers (2020-08-12T18:16:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.