Related papers: Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image

Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image

URL: http://arxiv.org/abs/2308.10257v1
Date: Sun, 20 Aug 2023 12:53:50 GMT
Title: Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image
Authors: Liao Shen, Xingyi Li, Huiqiang Sun, Juewen Peng, Ke Xian, Zhiguo Cao, Guosheng Lin
Abstract summary: We study the problem of synthesizing a long-term dynamic video from only a single image. Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories. We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
Score: 59.18564636990079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of synthesizing a long-term dynamic video from only a single image. This is challenging since it requires consistent visual content movements given large camera motions. Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories. To address these issues, it is essential to estimate the underlying 4D (including 3D geometry and scene motion) and fill in the occluded regions. To this end, we present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image. On the one hand, we utilize layered depth images (LDIs) to represent a scene, and they are then unprojected to form a feature point cloud. To animate the visual content, the feature point cloud is displaced based on the scene flow derived from motion estimation and the corresponding camera pose. Such 4D representation enables our method to maintain the global consistency of the generated dynamic video. On the other hand, we fill in the occluded regions by using a pretrained diffusion model to inpaint and outpaint the input image. This enables our method to work under large camera motions. Benefiting from our design, our method can be training-free which saves a significant amount of training time. Experimental results demonstrate the effectiveness of our approach, which showcases compelling rendering results.

Related papers

Voyaging into Perpetual Dynamic Scenes from a Single View [31.85867311855001]
Key challenge is to ensure that different generated views be consistent with the underlying 3D motions.<n>We propose DynamicVoyager, which reformulates dynamic scene generation as a scene outpainting problem with new dynamic content.<n> Experiments show that our model can generate perpetual scenes with consistent motions along fly-through cameras.
arXiv Detail & Related papers (2025-07-05T22:49:25Z)
DreamJourney: Perpetual View Generation with Video Diffusion Models [91.88716097573206]
Perpetual view generation aims to synthesize a long-term video corresponding to an arbitrary camera trajectory solely from a single input image.<n>Recent methods commonly utilize a pre-trained text-to-image diffusion model to synthesize new content of previously unseen regions along camera movement.<n>We present DreamJourney, a two-stage framework that leverages the world simulation capacity of video diffusion models to trigger a new perpetual scene view generation task.
arXiv Detail & Related papers (2025-06-21T12:51:34Z)
Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images [5.754780404074765]
We propose representing a complete 3D space for dynamic scene video by modeling explicit representations, specifically 4D Gaussians, from a single image. As far as we know, this is the first attempt that considers animation while representing a complete 3D space from a single landscape image.
arXiv Detail & Related papers (2025-04-04T06:51:39Z)
PaintScene4D: Consistent 4D Scene Generation from Text Prompts [29.075849524496707]
PaintScene4D is a novel text-to-4D scene generation framework. It harnesses video generative models trained on diverse real-world datasets. It produces realistic 4D scenes that can be viewed from arbitrary trajectories.
arXiv Detail & Related papers (2024-12-05T18:59:57Z)
GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time. GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process. GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z)
Controllable Longer Image Animation with Diffusion Models [12.565739255499594]
We introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos. We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
arXiv Detail & Related papers (2024-05-27T16:08:00Z)
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation [39.806610397357986]
We present our findings from the research conducted on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action segmentation task. We convert point cloud videos into depth videos and employ traditional video modeling methods to improve 4D action segmentation. The proposed method achieved the first place in the 4D Action Track of the HOI4D Challenge 2023.
arXiv Detail & Related papers (2023-07-31T16:14:24Z)
FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning. Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion. We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z)
3D Cinemagraphy from a Single Image [73.09720823592092]
We present 3D Cinemagraphy, a new technique that marries 2D image animation with 3D photography. Given a single still image as input, our goal is to generate a video that contains both visual content animation and camera motion.
arXiv Detail & Related papers (2023-03-10T06:08:23Z)
DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground. This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion. In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z)
Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.