Editable Free-viewpoint Video Using a Layered Neural Representation
- URL: http://arxiv.org/abs/2104.14786v1
- Date: Fri, 30 Apr 2021 06:50:45 GMT
- Title: Editable Free-viewpoint Video Using a Layered Neural Representation
- Authors: Jiakai Zhang, Xinhang Liu, Xinyi Ye, Fuqiang Zhao, Yanshun Zhang,
Minye Wu, Yingliang Zhang, Lan Xu, Jingyi Yu
- Abstract summary: We propose the first approach for editable free-viewpoint video generation for large-scale dynamic scenes using only sparse 16 cameras.
The core of our approach is a new layered neural representation, where each dynamic entity including the environment itself is formulated into a space-time coherent neural layered radiance representation called ST-NeRF.
Experiments demonstrate the effectiveness of our approach to achieve high-quality, photo-realistic, and editable free-viewpoint video generation for dynamic scenes.
- Score: 35.44420164057911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating free-viewpoint videos is critical for immersive VR/AR experience
but recent neural advances still lack the editing ability to manipulate the
visual perception for large dynamic scenes. To fill this gap, in this paper we
propose the first approach for editable photo-realistic free-viewpoint video
generation for large-scale dynamic scenes using only sparse 16 cameras. The
core of our approach is a new layered neural representation, where each dynamic
entity including the environment itself is formulated into a space-time
coherent neural layered radiance representation called ST-NeRF. Such layered
representation supports fully perception and realistic manipulation of the
dynamic scene whilst still supporting a free viewing experience in a wide
range. In our ST-NeRF, the dynamic entity/layer is represented as continuous
functions, which achieves the disentanglement of location, deformation as well
as the appearance of the dynamic entity in a continuous and self-supervised
manner. We propose a scene parsing 4D label map tracking to disentangle the
spatial information explicitly, and a continuous deform module to disentangle
the temporal motion implicitly. An object-aware volume rendering scheme is
further introduced for the re-assembling of all the neural layers. We adopt a
novel layered loss and motion-aware ray sampling strategy to enable efficient
training for a large dynamic scene with multiple performers, Our framework
further enables a variety of editing functions, i.e., manipulating the scale
and location, duplicating or retiming individual neural layers to create
numerous visual effects while preserving high realism. Extensive experiments
demonstrate the effectiveness of our approach to achieve high-quality,
photo-realistic, and editable free-viewpoint video generation for dynamic
scenes.
Related papers
- Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling [70.34875558830241]
We present a way for learning a-temporal (4D) embedding, based on semantic semantic gears to allow for stratified modeling of dynamic regions of rendering the scene.
At the same time, almost for free, our tracking approach enables free-viewpoint of interest - a functionality not yet achieved by existing NeRF-based methods.
arXiv Detail & Related papers (2024-06-06T03:37:39Z) - DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video [18.424138608823267]
We propose DyBluRF, a dynamic radiance field approach that synthesizes sharp novel views from a monocular video affected by motion blur.
To account for motion blur in input images, we simultaneously capture the camera trajectory and object Discrete Cosine Transform (DCT) trajectories within the scene.
arXiv Detail & Related papers (2024-03-15T08:48:37Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in
Motion with Neural Rendering [9.600908665766465]
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation.
We show that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes.
arXiv Detail & Related papers (2020-12-22T23:45:28Z) - Non-Rigid Neural Radiance Fields: Reconstruction and Novel View
Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes.
We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z) - D-NeRF: Neural Radiance Fields for Dynamic Scenes [72.75686949608624]
We introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain.
D-NeRF reconstructs images of objects under rigid and non-rigid motions from a camera moving around the scene.
We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.
arXiv Detail & Related papers (2020-11-27T19:06:50Z) - Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes [70.76742458931935]
We introduce a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion.
Our representation is optimized through a neural network to fit the observed input views.
We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion.
arXiv Detail & Related papers (2020-11-26T01:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.