Future Video Synthesis with Object Motion Prediction
- URL: http://arxiv.org/abs/2004.00542v2
- Date: Wed, 15 Apr 2020 10:55:42 GMT
- Title: Future Video Synthesis with Object Motion Prediction
- Authors: Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen
- Abstract summary: Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics.
The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects.
Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
- Score: 54.31508711871764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach to predict future video frames given a sequence of
continuous video frames in the past. Instead of synthesizing images directly,
our approach is designed to understand the complex scene dynamics by decoupling
the background scene and moving objects. The appearance of the scene components
in the future is predicted by non-rigid deformation of the background and
affine transformation of moving objects. The anticipated appearances are
combined to create a reasonable video in the future. With this procedure, our
method exhibits much less tearing or distortion artifact compared to other
approaches. Experimental results on the Cityscapes and KITTI datasets show that
our model outperforms the state-of-the-art in terms of visual quality and
accuracy.
Related papers
- Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together.
Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds.
To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - WALDO: Future Video Synthesis using Object Layer Decomposition and
Parametric Flow Prediction [82.79642869586587]
WALDO is a novel approach to the prediction of future video frames from past ones.
Individual images are decomposed into multiple layers combining object masks and a small set of control points.
The layer structure is shared across all frames in each video to build dense inter-frame connections.
arXiv Detail & Related papers (2022-11-25T18:59:46Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Temporal View Synthesis of Dynamic Scenes through 3D Object Motion
Estimation with Multi-Plane Images [8.185918509343816]
We study the problem of temporal view synthesis (TVS), where the goal is to predict the next frames of a video.
In this work, we consider the TVS of dynamic scenes in which both the user and objects are moving.
We predict the motion of objects by isolating and estimating the 3D object motion in the past frames and then extrapolating it.
arXiv Detail & Related papers (2022-08-19T17:40:13Z) - Stochastic Video Prediction with Structure and Motion [14.424465835834042]
We propose to factorize video observations into static and dynamic components.
By learning separate distributions of changes in foreground and background, we can decompose the scene into static and dynamic parts.
Our experiments demonstrate that disentangling structure and motion helps video prediction, leading to better future predictions in complex driving scenarios.
arXiv Detail & Related papers (2022-03-20T11:29:46Z) - Learning Semantic-Aware Dynamics for Video Prediction [68.04359321855702]
We propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions.
The appearance of the scene is warped from past frames using the predicted motion in co-visible regions.
arXiv Detail & Related papers (2021-04-20T05:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.