Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering
- URL: http://arxiv.org/abs/2402.00827v2
- Date: Thu, 14 Mar 2024 05:30:10 GMT
- Title: Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering
- Authors: Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu,
- Abstract summary: We propose the Efficient Monotonic Video Style Avatar (Emo-Avatar) through deferred neural rendering.
Emo-Avatar reduces style customization time from hours to merely 5 minutes compared with existing methods.
- Score: 64.85782838199427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artistic video portrait generation is a significant and sought-after task in the fields of computer graphics and vision. While various methods have been developed that integrate NeRFs or StyleGANs with instructional editing models for creating and editing drivable portraits, these approaches face several challenges. They often rely heavily on large datasets, require extensive customization processes, and frequently result in reduced image quality. To address the above problems, we propose the Efficient Monotonic Video Style Avatar (Emo-Avatar) through deferred neural rendering that enhances StyleGAN's capacity for producing dynamic, drivable portrait videos. We proposed a two-stage deferred neural rendering pipeline. In the first stage, we utilize few-shot PTI initialization to initialize the StyleGAN generator through several extreme poses sampled from the video to capture the consistent representation of aligned faces from the target portrait. In the second stage, we propose a Laplacian pyramid for high-frequency texture sampling from UV maps deformed by dynamic flow of expression for motion-aware texture prior integration to provide torso features to enhance StyleGAN's ability to generate complete and upper body for portrait video rendering. Emo-Avatar reduces style customization time from hours to merely 5 minutes compared with existing methods. In addition, Emo-Avatar requires only a single reference image for editing and employs region-aware contrastive learning with semantic invariant CLIP guidance, ensuring consistent high-resolution output and identity preservation. Through both quantitative and qualitative assessments, Emo-Avatar demonstrates superior performance over existing methods in terms of training efficiency, rendering quality and editability in self- and cross-reenactment.
Related papers
- 3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations [41.303036354495816]
We introduce an expressive and compact representation that encodes texture-related attributes of the 3D Gaussians in the tensorial format.
We store appearance of neutral expression in static tri-planes, and represents dynamic texture details for different expressions using lightweight 1D feature lines.
Experiments show this design enables accurate face dynamic details capturing while maintains real-time rendering and significantly reduces storage costs.
arXiv Detail & Related papers (2025-04-21T08:50:12Z) - GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting [15.509324745484141]
We propose GazeGaussian, a high-fidelity gaze redirection method that uses a two-stream 3DGS model to represent the face and eye regions separately.
GazeGaussian outperforms existing methods in rendering speed, gaze redirection accuracy, and facial synthesis across multiple datasets.
arXiv Detail & Related papers (2024-11-20T02:15:23Z) - G-Style: Stylized Gaussian Splatting [5.363168481735954]
We introduce G-Style, a novel algorithm designed to transfer the style of an image onto a 3D scene represented using Gaussian Splatting.
G-Style generates high-quality stylizations within just a few minutes, outperforming existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-08-28T10:43:42Z) - GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting [27.699313086744237]
GaussianTalker is a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting.
Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction.
Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose.
arXiv Detail & Related papers (2024-04-22T09:51:43Z) - Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes [50.92217884840301]
Gaussian Opacity Fields (GOF) is a novel approach for efficient, high-quality, and adaptive surface reconstruction in scenes.
GOF is derived from ray-tracing-based volume rendering of 3D Gaussians.
GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2024-04-16T17:57:19Z) - Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting [55.71424195454963]
Spec-Gaussian is an approach that utilizes an anisotropic spherical Gaussian appearance field instead of spherical harmonics.
Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality.
This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces.
arXiv Detail & Related papers (2024-02-24T17:22:15Z) - Mesh-based Gaussian Splatting for Real-time Large-scale Deformation [58.18290393082119]
It is challenging for users to directly deform or manipulate implicit representations with large deformations in the real-time fashion.
We develop a novel GS-based method that enables interactive deformation.
Our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate.
arXiv Detail & Related papers (2024-02-07T12:36:54Z) - GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning [60.33970027554299]
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations.
In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions.
Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts.
arXiv Detail & Related papers (2023-12-18T18:59:12Z) - GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation [35.39887092268696]
This paper presents a framework to model the actional human head with anisotropic 3D Gaussians.
In experiments, our method can produce high-fidelity renderings, outperforming state-of-the-art approaches in reconstruction, cross-identity reenactment, and novel view synthesis tasks.
arXiv Detail & Related papers (2023-12-04T05:24:45Z) - Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering [71.44349029439944]
Recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed.
We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians.
We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering.
arXiv Detail & Related papers (2023-11-30T17:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.