Related papers: Towards 4D Human Video Stylization

Towards 4D Human Video Stylization

URL: http://arxiv.org/abs/2312.04143v1
Date: Thu, 7 Dec 2023 08:58:33 GMT
Title: Towards 4D Human Video Stylization
Authors: Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang
Abstract summary: We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation. We leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space. Our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.
Score: 56.33756124829298
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation within a unified framework. While numerous video stylization methods have been developed, they are often restricted to rendering images in specific viewpoints of the input video, lacking the capability to generalize to novel views and novel poses in dynamic scenes. To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space. Our innovative approach involves the simultaneous representation of both the human subject and the surrounding scene using two NeRFs. This dual representation facilitates the animation of human subjects across various poses and novel viewpoints. Specifically, we introduce a novel geometry-guided tri-plane representation, significantly enhancing feature representation robustness compared to direct tri-plane optimization. Following the video reconstruction, stylization is performed within the NeRFs' rendered feature space. Extensive experiments demonstrate that the proposed method strikes a superior balance between stylized textures and temporal coherence, surpassing existing approaches. Furthermore, our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.

Related papers

CFSynthesis: Controllable and Free-view 3D Human Video Synthesis [57.561237409603066]
CFSynthesis is a novel framework for generating high-quality human videos with customizable attributes. Our method leverages a texture-SMPL-based representation to ensure consistent and stable character appearances across free viewpoints. Results on multiple datasets show that CFSynthesis achieves state-of-the-art performance in complex human animations.
arXiv Detail & Related papers (2024-12-15T05:57:36Z)
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images. Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z)
G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles [45.92812062685523]
Existing methods for 3D style transfer need extensive per-scene optimization for single or multiple styles. In this work, we overcome the limitations of existing methods by rendering stylized novel views from a NeRF without the need for per-scene or per-style optimization. Our findings demonstrate that this approach achieves a good visual quality comparable to that of per-scene methods.
arXiv Detail & Related papers (2024-08-24T08:04:19Z)
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together. Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds. To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z)
Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input. As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z)
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model [57.855362366674264]
We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues. Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
arXiv Detail & Related papers (2023-08-15T13:00:42Z)
HDHumans: A Hybrid Approach for High-fidelity Digital Humans [107.19426606778808]
HDHumans is the first method for HD human character synthesis that jointly produces an accurate and temporally coherent 3D deforming surface. Our method is carefully designed to achieve a synergy between classical surface deformation and neural radiance fields (NeRF)
arXiv Detail & Related papers (2022-10-21T14:42:11Z)
SNeRF: Stylized Neural Implicit Representations for 3D Scenes [9.151746397358522]
This paper investigates 3D scene stylization that provides a strong inductive bias for consistent novel view synthesis. We adopt the emerging neural radiance fields (NeRF) as our choice of 3D scene representation. We introduce a new training method to address this problem by alternating the NeRF and stylization optimization steps.
arXiv Detail & Related papers (2022-07-05T23:45:02Z)
Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos. Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation. In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z)
Stylizing 3D Scene via Implicit Representation and HyperNetwork [34.22448260525455]
A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches. Inspired by the high quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style. Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance field model, and a hypernetwork to transfer the style information into the scene representation.
arXiv Detail & Related papers (2021-05-27T09:11:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.