Neural Human Video Rendering by Learning Dynamic Textures and
Rendering-to-Video Translation
- URL: http://arxiv.org/abs/2001.04947v3
- Date: Mon, 5 Jul 2021 21:08:52 GMT
- Title: Neural Human Video Rendering by Learning Dynamic Textures and
Rendering-to-Video Translation
- Authors: Lingjie Liu, Weipeng Xu, Marc Habermann, Michael Zollhoefer, Florian
Bernard, Hyeongwoo Kim, Wenping Wang, Christian Theobalt
- Abstract summary: We propose a novel human video synthesis method by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space.
We show several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
- Score: 99.64565200170897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthesizing realistic videos of humans using neural networks has been a
popular alternative to the conventional graphics-based rendering pipeline due
to its high efficiency. Existing works typically formulate this as an
image-to-image translation problem in 2D screen space, which leads to artifacts
such as over-smoothing, missing body parts, and temporal instability of
fine-scale detail, such as pose-dependent wrinkles in the clothing. In this
paper, we propose a novel human video synthesis method that approaches these
limiting factors by explicitly disentangling the learning of time-coherent
fine-scale details from the embedding of the human in 2D screen space. More
specifically, our method relies on the combination of two convolutional neural
networks (CNNs). Given the pose information, the first CNN predicts a dynamic
texture map that contains time-coherent high-frequency details, and the second
CNN conditions the generation of the final video on the temporally coherent
output of the first CNN. We demonstrate several applications of our approach,
such as human reenactment and novel view synthesis from monocular video, where
we show significant improvement over the state of the art both qualitatively
and quantitatively.
Related papers
- D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Human Performance Modeling and Rendering via Neural Animated Mesh [40.25449482006199]
We bridge the traditional mesh with a new class of neural rendering.
In this paper, we present a novel approach for rendering human views from video.
We demonstrate our approach on various platforms, inserting virtual human performances into AR headsets.
arXiv Detail & Related papers (2022-09-18T03:58:00Z) - Fast Training of Neural Lumigraph Representations using Meta Learning [109.92233234681319]
We develop a new neural rendering approach with the goal of quickly learning a high-quality representation which can also be rendered in real-time.
Our approach, MetaNLR++, accomplishes this by using a unique combination of a neural shape representation and 2D CNN-based image feature extraction, aggregation, and re-projection.
We show that MetaNLR++ achieves similar or better photorealistic novel view synthesis results in a fraction of the time that competing methods require.
arXiv Detail & Related papers (2021-06-28T18:55:50Z) - Robust Pose Transfer with Dynamic Details using Neural Video Rendering [48.48929344349387]
We propose a neural video rendering framework coupled with an image-translation-based dynamic details generation network (D2G-Net)
To be specific, a novel texture representation is presented to encode both the static and pose-varying appearance characteristics.
We demonstrate that our neural human video is capable of achieving both clearer dynamic details and more robust performance even on short videos with only 2k - 4k frames.
arXiv Detail & Related papers (2021-06-27T03:40:22Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - Speech2Video Synthesis with 3D Skeleton Regularization and Expressive
Body Poses [36.00309828380724]
We propose a novel approach to convert given speech audio to a photo-realistic speaking video of a specific person.
We achieve this by first generating 3D skeleton movements from the audio sequence using a recurrent neural network (RNN)
To make the skeleton movement realistic and expressive, we embed the knowledge of an articulated 3D human skeleton and a learned dictionary of personal speech iconic gestures into the generation process.
arXiv Detail & Related papers (2020-07-17T19:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.