Towards 4D Human Video Stylization
- URL: http://arxiv.org/abs/2312.04143v1
- Date: Thu, 7 Dec 2023 08:58:33 GMT
- Title: Towards 4D Human Video Stylization
- Authors: Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang
- Abstract summary: We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation.
We leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.
Our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.
- Score: 56.33756124829298
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a first step towards 4D (3D and time) human video stylization,
which addresses style transfer, novel view synthesis and human animation within
a unified framework. While numerous video stylization methods have been
developed, they are often restricted to rendering images in specific viewpoints
of the input video, lacking the capability to generalize to novel views and
novel poses in dynamic scenes. To overcome these limitations, we leverage
Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in
the rendered feature space. Our innovative approach involves the simultaneous
representation of both the human subject and the surrounding scene using two
NeRFs. This dual representation facilitates the animation of human subjects
across various poses and novel viewpoints. Specifically, we introduce a novel
geometry-guided tri-plane representation, significantly enhancing feature
representation robustness compared to direct tri-plane optimization. Following
the video reconstruction, stylization is performed within the NeRFs' rendered
feature space. Extensive experiments demonstrate that the proposed method
strikes a superior balance between stylized textures and temporal coherence,
surpassing existing approaches. Furthermore, our framework uniquely extends its
capabilities to accommodate novel poses and viewpoints, making it a versatile
tool for creative human video stylization.
Related papers
- ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles [45.92812062685523]
Existing methods for 3D style transfer need extensive per-scene optimization for single or multiple styles.
In this work, we overcome the limitations of existing methods by rendering stylized novel views from a NeRF without the need for per-scene or per-style optimization.
Our findings demonstrate that this approach achieves a good visual quality comparable to that of per-scene methods.
arXiv Detail & Related papers (2024-08-24T08:04:19Z) - Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together.
Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds.
To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z) - Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input.
As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z) - Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with
Image Diffusion Model [57.855362366674264]
We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues.
Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
arXiv Detail & Related papers (2023-08-15T13:00:42Z) - HDHumans: A Hybrid Approach for High-fidelity Digital Humans [107.19426606778808]
HDHumans is the first method for HD human character synthesis that jointly produces an accurate and temporally coherent 3D deforming surface.
Our method is carefully designed to achieve a synergy between classical surface deformation and neural radiance fields (NeRF)
arXiv Detail & Related papers (2022-10-21T14:42:11Z) - SNeRF: Stylized Neural Implicit Representations for 3D Scenes [9.151746397358522]
This paper investigates 3D scene stylization that provides a strong inductive bias for consistent novel view synthesis.
We adopt the emerging neural radiance fields (NeRF) as our choice of 3D scene representation.
We introduce a new training method to address this problem by alternating the NeRF and stylization optimization steps.
arXiv Detail & Related papers (2022-07-05T23:45:02Z) - Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos.
Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation.
In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z) - Stylizing 3D Scene via Implicit Representation and HyperNetwork [34.22448260525455]
A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches.
Inspired by the high quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style.
Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance field model, and a hypernetwork to transfer the style information into the scene representation.
arXiv Detail & Related papers (2021-05-27T09:11:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.