FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model
- URL: http://arxiv.org/abs/2409.13180v2
- Date: Wed, 9 Oct 2024 02:29:57 GMT
- Title: FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model
- Authors: Feng Qiu, Wei Zhang, Chen Liu, Rudong An, Lincheng Li, Yu Ding, Changjie Fan, Zhipeng Hu, Xin Yu,
- Abstract summary: Video-driven 3D facial animation transfer aims to drive avatars to reproduce the expressions of actors.
We propose FreeAvatar, a robust facial animation transfer method that relies solely on our learned expression representation.
- Score: 45.0201701977516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video-driven 3D facial animation transfer aims to drive avatars to reproduce the expressions of actors. Existing methods have achieved remarkable results by constraining both geometric and perceptual consistency. However, geometric constraints (like those designed on facial landmarks) are insufficient to capture subtle emotions, while expression features trained on classification tasks lack fine granularity for complex emotions. To address this, we propose \textbf{FreeAvatar}, a robust facial animation transfer method that relies solely on our learned expression representation. Specifically, FreeAvatar consists of two main components: the expression foundation model and the facial animation transfer model. In the first component, we initially construct a facial feature space through a face reconstruction task and then optimize the expression feature space by exploring the similarities among different expressions. Benefiting from training on the amounts of unlabeled facial images and re-collected expression comparison dataset, our model adapts freely and effectively to any in-the-wild input facial images. In the facial animation transfer component, we propose a novel Expression-driven Multi-avatar Animator, which first maps expressive semantics to the facial control parameters of 3D avatars and then imposes perceptual constraints between the input and output images to maintain expression consistency. To make the entire process differentiable, we employ a trained neural renderer to translate rig parameters into corresponding images. Furthermore, unlike previous methods that require separate decoders for each avatar, we propose a dynamic identity injection module that allows for the joint training of multiple avatars within a single network.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - Attention-Based VR Facial Animation with Visual Mouth Camera Guidance
for Immersive Telepresence Avatars [19.70403947793871]
We present a hybrid method that uses both keypoints and direct visual guidance from a mouth camera.
Our method generalizes to unseen operators and requires only a quick enrolment step with capture of two short videos.
We highlight how the facial animation contributed to our victory at the ANA Avatar XPRIZE Finals.
arXiv Detail & Related papers (2023-12-15T12:45:11Z) - GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar [48.21353924040671]
We propose to learn person-specific animatable avatars from images without assuming to have access to precise facial expression tracking.
We learn a mapping from 3DMM facial expression parameters to the latent space of the generative model.
With this scheme, we decouple 3D appearance reconstruction and animation control to achieve high fidelity in image synthesis.
arXiv Detail & Related papers (2023-11-22T19:13:00Z) - Facial Expression Re-targeting from a Single Character [0.0]
The standard method to represent facial expressions for 3D characters is by blendshapes.
We developed a unique deep-learning architecture that groups landmarks for each facial organ and connects them to relevant blendshape weights.
Our approach achieved a higher MOS of 68% and a lower MSE of 44.2% when tested on videos with various users and expressions.
arXiv Detail & Related papers (2023-06-21T11:35:22Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - I M Avatar: Implicit Morphable Head Avatars from Videos [68.13409777995392]
We propose IMavatar, a novel method for learning implicit head avatars from monocular videos.
Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expression- and pose-related deformations via learned blendshapes and skinning fields.
We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-14T15:30:32Z) - Learning an Animatable Detailed 3D Face Model from In-The-Wild Images [50.09971525995828]
We present the first approach to jointly learn a model with animatable detail and a detailed 3D face regressor from in-the-wild images.
Our DECA model is trained to robustly produce a UV displacement map from a low-dimensional latent representation.
We introduce a novel detail-consistency loss to disentangle person-specific details and expression-dependent wrinkles.
arXiv Detail & Related papers (2020-12-07T19:30:45Z) - Facial Expression Retargeting from Human to Avatar Made Easy [34.86394328702422]
Facial expression from humans to virtual characters is a useful technique in computer graphics and animation.
Traditional methods use markers or blendshapes to construct a mapping between the human and avatar faces.
We propose a brand-new solution to this cross-domain expression transfer problem via nonlinear expression embedding and expression domain translation.
arXiv Detail & Related papers (2020-08-12T04:55:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.