VividDream: Generating 3D Scene with Ambient Dynamics
- URL: http://arxiv.org/abs/2405.20334v1
- Date: Thu, 30 May 2024 17:59:24 GMT
- Title: VividDream: Generating 3D Scene with Ambient Dynamics
- Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang,
- Abstract summary: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt.
VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
- Score: 13.189732244489225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
Related papers
- Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes [49.26872036160368]
We propose a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation.
We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes.
arXiv Detail & Related papers (2024-11-28T16:01:58Z) - CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models [98.03734318657848]
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video.
We leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis.
We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks.
arXiv Detail & Related papers (2024-11-27T18:57:16Z) - DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion [22.11178016375823]
DimensionX is a framework designed to generate 3D and 4D scenes from just a single image with video diffusion.
Our approach begins with the insight that both the spatial structure of a 3D scene and the temporal evolution of a 4D scene can be effectively represented through sequences of video frames.
arXiv Detail & Related papers (2024-11-07T18:07:31Z) - 4-LEGS: 4D Language Embedded Gaussian Splatting [12.699978393733309]
We show how to lift-temporal features to a 4D representation based on 3D Gaussianting.
This enables an interactive interface where the user cantemporally localize events in the video from text prompts.
We demonstrate our system on public 3D video datasets of people and animals performing various actions.
arXiv Detail & Related papers (2024-10-14T17:00:53Z) - Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation.
Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately.
Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z) - Text-To-4D Dynamic Scene Generation [111.89517759596345]
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.
Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency.
The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment.
arXiv Detail & Related papers (2023-01-26T18:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.