Related papers: Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention

Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention

URL: http://arxiv.org/abs/2302.12532v1
Date: Fri, 24 Feb 2023 09:36:31 GMT
Title: Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention
Authors: Bin Liu, Xiaolin Wei, Bo Li, Junjie Cao, Yu-Kun Lai
Abstract summary: A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention. The proposed method can produce more realistic facial expressions and head posture movements.
Score: 52.63080543011595
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most of the existing audio-driven 3D facial animation methods suffered from the lack of detailed facial expression and head pose, resulting in unsatisfactory experience of human-robot interaction. In this paper, a novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention. To synthesize real and detailed expression, a hierarchical decomposition strategy is proposed to encode the audio signal into both a global latent feature and a local vertex-wise control feature. Then the local and global audio features combined with vertex spatial features are used to predict the final consistent facial animation via a graph convolutional neural network by fusing the intrinsic spatial topology structure of the face model and the corresponding semantic feature of the audio. To accomplish pose-controllable animation, we introduce a novel pose attribute augmentation method by utilizing the 2D talking face technique. Experimental results indicate that the proposed method can produce more realistic facial expressions and head posture movements. Qualitative and quantitative experiments show that the proposed method achieves competitive performance against state-of-the-art methods.

Related papers

GaussianSpeech: Audio-Driven Gaussian Avatars [76.10163891172192]
We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details.
arXiv Detail & Related papers (2024-11-27T18:54:08Z)
KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding [19.15471840100407]
We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency.
arXiv Detail & Related papers (2024-09-02T09:41:24Z)
G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation. Our novel approach empowers the face animation model to incorporate 3D information using only 2D images. In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z)
Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs [67.27840327499625]
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters. Our approach learns from sparse face landmarks and upper-body joints, estimated directly from video data, to generate plausible emotive character motions.
arXiv Detail & Related papers (2024-06-26T04:53:11Z)
NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head. Our method can craft a 3D-consistent facial feature space corresponding to a single image. We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [85.16273912625022]
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
arXiv Detail & Related papers (2023-12-13T19:01:07Z)
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. It captures the complex one-to-many relationships between speech and 3D face based on diffusion. It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z)
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment [52.33618333954383]
We propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models. Our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
arXiv Detail & Related papers (2023-06-13T07:08:22Z)
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.