Playable Environments: Video Manipulation in Space and Time
- URL: http://arxiv.org/abs/2203.01914v1
- Date: Thu, 3 Mar 2022 18:51:05 GMT
- Title: Playable Environments: Video Manipulation in Space and Time
- Authors: Willi Menapace, Aliaksandr Siarohin, Christian Theobalt, Vladislav
Golyanik, Sergey Tulyakov, St\'ephane Lathuili\`ere, Elisa Ricci
- Abstract summary: We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.
With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions.
Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering.
- Score: 98.0621309257937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Playable Environments - a new representation for interactive video
generation and manipulation in space and time. With a single image at inference
time, our novel framework allows the user to move objects in 3D while
generating a video by providing a sequence of desired actions. The actions are
learnt in an unsupervised manner. The camera can be controlled to get the
desired viewpoint. Our method builds an environment state for each frame, which
can be manipulated by our proposed action module and decoded back to the image
space with volumetric rendering. To support diverse appearances of objects, we
extend neural radiance fields with style-based modulation. Our method trains on
a collection of various monocular videos requiring only the estimated camera
parameters and 2D object locations. To set a challenging benchmark, we
introduce two large scale video datasets with significant camera movements. As
evidenced by our experiments, playable environments enable several creative
applications not attainable by prior video synthesis works, including playable
3D video generation, stylization and manipulation. Further details, code and
examples are available at
https://willi-menapace.github.io/playable-environments-website
Related papers
- Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling [21.1274747033854]
Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes.
Milo is a novel framework which can synthesize character videos with controllable attributes.
Milo achieves advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes.
arXiv Detail & Related papers (2024-09-24T15:00:07Z) - OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos [7.616167860385134]
It has long been challenging to recover the underlying dynamic 3D scene representations from a monocular RGB video.
We introduce a new framework, called OSN, to learn all plausible 3D scene configurations that match the input video.
Our method demonstrates a clear advantage in learning fine-grained 3D scene geometry.
arXiv Detail & Related papers (2024-07-08T05:03:46Z) - Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline.
Our model does not require depth as input, and does not explicitly model 3D scene geometry.
We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Video Autoencoder: self-supervised disentanglement of static 3D
structure and motion [60.58836145375273]
A video autoencoder is proposed for learning disentan- gled representations of 3D structure and camera pose from videos.
The representation can be applied to a range of tasks, including novel view synthesis, camera pose estimation, and video generation by motion following.
arXiv Detail & Related papers (2021-10-06T17:57:42Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.