Related papers: VividDream: Generating 3D Scene with Ambient Dynamics

VividDream: Generating 3D Scene with Ambient Dynamics

URL: http://arxiv.org/abs/2405.20334v1
Date: Thu, 30 May 2024 17:59:24 GMT
Title: VividDream: Generating 3D Scene with Ambient Dynamics
Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang,
Abstract summary: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
Score: 13.189732244489225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.

Related papers

Voyaging into Perpetual Dynamic Scenes from a Single View [31.85867311855001]
Key challenge is to ensure that different generated views be consistent with the underlying 3D motions.<n>We propose DynamicVoyager, which reformulates dynamic scene generation as a scene outpainting problem with new dynamic content.<n> Experiments show that our model can generate perpetual scenes with consistent motions along fly-through cameras.
arXiv Detail & Related papers (2025-07-05T22:49:25Z)
Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images [5.754780404074765]
We propose representing a complete 3D space for dynamic scene video by modeling explicit representations, specifically 4D Gaussians, from a single image. As far as we know, this is the first attempt that considers animation while representing a complete 3D space from a single landscape image.
arXiv Detail & Related papers (2025-04-04T06:51:39Z)
Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes [49.26872036160368]
We propose a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes.
arXiv Detail & Related papers (2024-11-28T16:01:58Z)
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models [98.03734318657848]
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. We leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks.
arXiv Detail & Related papers (2024-11-27T18:57:16Z)
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion [22.11178016375823]
DimensionX is a framework designed to generate 3D and 4D scenes from just a single image with video diffusion. Our approach begins with the insight that both the spatial structure of a 3D scene and the temporal evolution of a 4D scene can be effectively represented through sequences of video frames.
arXiv Detail & Related papers (2024-11-07T18:07:31Z)
4-LEGS: 4D Language Embedded Gaussian Splatting [12.699978393733309]
We show how to lift-temporal features to a 4D representation based on 3D Gaussianting. This enables an interactive interface where the user cantemporally localize events in the video from text prompts. We demonstrate our system on public 3D video datasets of people and animals performing various actions.
arXiv Detail & Related papers (2024-10-14T17:00:53Z)
Comp4D: LLM-Guided Compositional 4D Scene Generation [65.5810466788355]
We present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Our method employs a compositional score distillation technique guided by the pre-defined trajectories.
arXiv Detail & Related papers (2024-03-25T17:55:52Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling [91.99172731031206]
Current text-to-4D methods face a three-way tradeoff between quality of scene appearance, 3D structure, and motion. We introduce hybrid score distillation sampling, an alternating optimization procedure that blends supervision signals from multiple pre-trained diffusion models.
arXiv Detail & Related papers (2023-11-29T18:58:05Z)
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image. Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories. We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z)
SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z)
Text-To-4D Dynamic Scene Generation [111.89517759596345]
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment.
arXiv Detail & Related papers (2023-01-26T18:14:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.