WonderJourney: Going from Anywhere to Everywhere
- URL: http://arxiv.org/abs/2312.03884v2
- Date: Fri, 12 Apr 2024 16:47:05 GMT
- Title: WonderJourney: Going from Anywhere to Everywhere
- Authors: Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann,
- Abstract summary: WonderJourney is a modularized framework for perpetual 3D scene generation.
We generate a journey through a long sequence of diverse yet coherently connected 3D scenes.
We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys"
- Score: 75.1284367548585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes in this journey, a text-driven point cloud generation pipeline to make a compelling and coherent sequence of 3D scenes, and a large VLM to verify the generated scenes. We show compelling, diverse visual results across various scene types and styles, forming imaginary "wonderjourneys". Project website: https://kovenyu.com/WonderJourney/
Related papers
- Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop [32.92038804110175]
Scene Copilot is a framework combining large language models (LLMs) with a procedural 3D scene generator.
Scene Codex is designed to translate textual user input into commands understandable by the 3D scene generator.
BlenderGPT provides users with an intuitive and direct way to precisely control the generated 3D scene and the final output video.
arXiv Detail & Related papers (2024-11-26T19:21:57Z) - SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting [53.32467009064287]
We propose a text-driven 3D-consistent scene generation model: SceneDreamer360.
Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation.
Our experiments demonstrate that SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.
arXiv Detail & Related papers (2024-08-25T02:56:26Z) - LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation [105.52153675890408]
3D immersive scene generation is a challenging yet critical task in computer vision and graphics.
LayerPano3D is a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt.
arXiv Detail & Related papers (2024-08-23T17:50:23Z) - WonderWorld: Interactive 3D Scene Generation from a Single Image [38.83667648993784]
We present WonderWorld, a novel framework for interactive 3D scene generation.
WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU.
arXiv Detail & Related papers (2024-06-13T17:59:10Z) - Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map.
Our model learns the data distribution of scene-level structures within a latent space.
After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z) - Generating Continual Human Motion in Diverse 3D Scenes [51.90506920301473]
We introduce a method to synthesize animator guided human motion across 3D scenes.
We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints.
Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together.
arXiv Detail & Related papers (2023-04-04T18:24:22Z) - SceneScape: Text-Driven Consistent Scene Generation [14.348512536556413]
We introduce a novel framework that generates such videos in an online fashion by combining a pre-trained text-to-image model with a pre-trained monocular depth prediction model.
To tackle the pivotal challenge of achieving 3D consistency, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene.
In contrast to previous works, which are applicable only to limited domains, our method generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles.
arXiv Detail & Related papers (2023-02-02T14:47:19Z) - Recognizing Scenes from Novel Viewpoints [99.90914180489456]
Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects.
We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
arXiv Detail & Related papers (2021-12-02T18:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.