FIELDS: Face reconstruction with accurate Inference of Expression using Learning with Direct Supervision
- URL: http://arxiv.org/abs/2511.21245v2
- Date: Fri, 28 Nov 2025 10:30:52 GMT
- Title: FIELDS: Face reconstruction with accurate Inference of Expression using Learning with Direct Supervision
- Authors: Chen Ling, Henglin Shi, Hedvig Kjellström,
- Abstract summary: FIELDS produces emotion-rich face models with highly realistic expressions, significantly improving in-the-wild facial expression recognition performance without sacrificing naturalness.<n>Our encoder is guided by authentic expression parameters from spontaneous 4D facial scans, while an intensity-aware emotion loss encourages the 3D expression parameters to capture genuine emotion content without exaggeration.
- Score: 5.903595788782866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial expressions convey the bulk of emotional information in human communication, yet existing 3D face reconstruction methods often miss subtle affective details due to reliance on 2D supervision and lack of 3D ground truth. We propose FIELDS (Face reconstruction with accurate Inference of Expression using Learning with Direct Supervision) to address these limitations by extending self-supervised 2D image consistency cues with direct 3D expression parameter supervision and an auxiliary emotion recognition branch. Our encoder is guided by authentic expression parameters from spontaneous 4D facial scans, while an intensity-aware emotion loss encourages the 3D expression parameters to capture genuine emotion content without exaggeration. This dual-supervision strategy bridges the 2D/3D domain gap and mitigates expression-intensity bias, yielding high-fidelity 3D reconstructions that preserve subtle emotional cues. From a single image, FIELDS produces emotion-rich face models with highly realistic expressions, significantly improving in-the-wild facial expression recognition performance without sacrificing naturalness.
Related papers
- Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation [20.91704034858042]
We model facial animation driven by both speech and emotion as a linear additive problem.<n>We learn a set of blendshapes driven by speech and emotion that can be mapped to the expression and jaw pose parameters of the FLAME model.<n>Our approach achieves superior emotional expressivity compared to existing methods, without compromising lip-sync quality.
arXiv Detail & Related papers (2025-10-29T07:29:21Z) - VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis [70.76837748695841]
We propose VisualSpeaker, a novel method that bridges the gap using photorealistic differentiable rendering, supervised by visual speech recognition, for improved 3D facial animation.<n>Our contribution is a perceptual lip-reading loss, derived by passing 3D Gaussian Splatting avatar renders through a pre-trained Visual Automatic Speech Recognition model during training.<n> Evaluation on the MEAD dataset demonstrates that VisualSpeaker improves both the standard Lip Vertex Error metric by 56.1% and the perceptual quality of the generated animations, while retaining the controllability of mesh-driven animation.
arXiv Detail & Related papers (2025-07-08T15:04:17Z) - EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models [66.67979602235015]
EmoDiffusion is a novel approach that disentangles different emotions in speech to generate rich 3D emotional facial expressions.<n>We capture facial expressions under the guidance of animation experts using LiveLinkFace on an iPhone.
arXiv Detail & Related papers (2025-03-14T02:54:22Z) - TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction [29.41924691414499]
3D facial reconstruction from a single in-the-wild image is a crucial task in human-centered computer vision tasks.<n>Current approaches struggle with exaggerated irregular mouth shapes, expressions, and asymmetrical facial movements.<n>We present TEASER, which addresses these challenges and enhances 3D facial geometry.
arXiv Detail & Related papers (2025-02-16T04:00:06Z) - 3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow Estimation [2.048226951354646]
We propose a novel warping technology which integrates the advantages of both 2D and 3D methods to achieve robust face re-enactment.
We generate dense 3D facial flow fields in feature space to warp an input image based on target expressions without depth information.
This enables explicit 3D geometric control for re-enacting misaligned source and target faces.
arXiv Detail & Related papers (2024-04-23T01:51:58Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.
On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
arXiv Detail & Related papers (2022-04-24T15:58:35Z) - MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation [69.35523133292389]
We propose a framework that a priori models physical attributes of the face explicitly, thus providing disentanglement by design.
Our method, MOST-GAN, integrates the expressive power and photorealism of style-based GANs with the physical disentanglement and flexibility of nonlinear 3D morphable models.
It achieves photorealistic manipulation of portrait images with fully disentangled 3D control over their physical attributes, enabling extreme manipulation of lighting, facial expression, and pose variations up to full profile view.
arXiv Detail & Related papers (2021-11-01T15:53:36Z) - Real-time Facial Expression Recognition "In The Wild'' by Disentangling
3D Expression from Identity [6.974241731162878]
This paper proposes a novel method for human emotion recognition from a single RGB image.
We construct a large-scale dataset of facial videos, rich in facial dynamics, identities, expressions, appearance and 3D pose variations.
Our proposed framework runs at 50 frames per second and is capable of robustly estimating parameters of 3D expression variation.
arXiv Detail & Related papers (2020-05-12T01:32:55Z) - Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art
2D Method [90.26041504667451]
We show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features.
The proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.
arXiv Detail & Related papers (2020-04-03T20:17:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.