Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image
- URL: http://arxiv.org/abs/2308.10257v1
- Date: Sun, 20 Aug 2023 12:53:50 GMT
- Title: Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image
- Authors: Liao Shen, Xingyi Li, Huiqiang Sun, Juewen Peng, Ke Xian, Zhiguo Cao,
Guosheng Lin
- Abstract summary: We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
- Score: 59.18564636990079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of synthesizing a long-term dynamic video from only a
single image. This is challenging since it requires consistent visual content
movements given large camera motions. Existing methods either hallucinate
inconsistent perpetual views or struggle with long camera trajectories. To
address these issues, it is essential to estimate the underlying 4D (including
3D geometry and scene motion) and fill in the occluded regions. To this end, we
present Make-It-4D, a novel method that can generate a consistent long-term
dynamic video from a single image. On the one hand, we utilize layered depth
images (LDIs) to represent a scene, and they are then unprojected to form a
feature point cloud. To animate the visual content, the feature point cloud is
displaced based on the scene flow derived from motion estimation and the
corresponding camera pose. Such 4D representation enables our method to
maintain the global consistency of the generated dynamic video. On the other
hand, we fill in the occluded regions by using a pretrained diffusion model to
inpaint and outpaint the input image. This enables our method to work under
large camera motions. Benefiting from our design, our method can be
training-free which saves a significant amount of training time. Experimental
results demonstrate the effectiveness of our approach, which showcases
compelling rendering results.
Related papers
- GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time.
GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process.
GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z) - Controllable Longer Image Animation with Diffusion Models [12.565739255499594]
We introduce an open-domain controllable image animation method using motion priors with video diffusion models.
Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos.
We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
arXiv Detail & Related papers (2024-05-27T16:08:00Z) - DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action
Segmentation [39.806610397357986]
We present our findings from the research conducted on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action segmentation task.
We convert point cloud videos into depth videos and employ traditional video modeling methods to improve 4D action segmentation.
The proposed method achieved the first place in the 4D Action Track of the HOI4D Challenge 2023.
arXiv Detail & Related papers (2023-07-31T16:14:24Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - 3D Cinemagraphy from a Single Image [73.09720823592092]
We present 3D Cinemagraphy, a new technique that marries 2D image animation with 3D photography.
Given a single still image as input, our goal is to generate a video that contains both visual content animation and camera motion.
arXiv Detail & Related papers (2023-03-10T06:08:23Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.