Towards 4D Human Video Stylization
- URL: http://arxiv.org/abs/2312.04143v1
- Date: Thu, 7 Dec 2023 08:58:33 GMT
- Title: Towards 4D Human Video Stylization
- Authors: Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang
- Abstract summary: We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation.
We leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.
Our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.
- Score: 56.33756124829298
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a first step towards 4D (3D and time) human video stylization,
which addresses style transfer, novel view synthesis and human animation within
a unified framework. While numerous video stylization methods have been
developed, they are often restricted to rendering images in specific viewpoints
of the input video, lacking the capability to generalize to novel views and
novel poses in dynamic scenes. To overcome these limitations, we leverage
Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in
the rendered feature space. Our innovative approach involves the simultaneous
representation of both the human subject and the surrounding scene using two
NeRFs. This dual representation facilitates the animation of human subjects
across various poses and novel viewpoints. Specifically, we introduce a novel
geometry-guided tri-plane representation, significantly enhancing feature
representation robustness compared to direct tri-plane optimization. Following
the video reconstruction, stylization is performed within the NeRFs' rendered
feature space. Extensive experiments demonstrate that the proposed method
strikes a superior balance between stylized textures and temporal coherence,
surpassing existing approaches. Furthermore, our framework uniquely extends its
capabilities to accommodate novel poses and viewpoints, making it a versatile
tool for creative human video stylization.
Related papers
- Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input.
As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z) - Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with
Image Diffusion Model [57.855362366674264]
We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues.
Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
arXiv Detail & Related papers (2023-08-15T13:00:42Z) - HDHumans: A Hybrid Approach for High-fidelity Digital Humans [107.19426606778808]
HDHumans is the first method for HD human character synthesis that jointly produces an accurate and temporally coherent 3D deforming surface.
Our method is carefully designed to achieve a synergy between classical surface deformation and neural radiance fields (NeRF)
arXiv Detail & Related papers (2022-10-21T14:42:11Z) - SNeRF: Stylized Neural Implicit Representations for 3D Scenes [9.151746397358522]
This paper investigates 3D scene stylization that provides a strong inductive bias for consistent novel view synthesis.
We adopt the emerging neural radiance fields (NeRF) as our choice of 3D scene representation.
We introduce a new training method to address this problem by alternating the NeRF and stylization optimization steps.
arXiv Detail & Related papers (2022-07-05T23:45:02Z) - Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos.
Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation.
In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z) - Neural Actor: Neural Free-view Synthesis of Human Actors with Pose
Control [80.79820002330457]
We propose a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses.
Our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.
arXiv Detail & Related papers (2021-06-03T17:40:48Z) - Stylizing 3D Scene via Implicit Representation and HyperNetwork [34.22448260525455]
A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches.
Inspired by the high quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style.
Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance field model, and a hypernetwork to transfer the style information into the scene representation.
arXiv Detail & Related papers (2021-05-27T09:11:30Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.