Ig3D: Integrating 3D Face Representations in Facial Expression Inference
- URL: http://arxiv.org/abs/2408.16907v1
- Date: Thu, 29 Aug 2024 21:08:07 GMT
- Title: Ig3D: Integrating 3D Face Representations in Facial Expression Inference
- Authors: Lu Dong, Xiao Wang, Srirangaraj Setlur, Venu Govindaraju, Ifeoma Nwogu,
- Abstract summary: This study aims to investigate the impacts of integrating 3D representations into the facial expression inference task.
We first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks.
We then explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks.
Our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks.
- Score: 12.975434103690812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks.
Related papers
- ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields [92.14328581392633]
We introduce a novel fine-grained interactive 3D segmentation and editing algorithm with radiance fields, which we refer to as SERF.
Our method entails creating a neural mesh representation by integrating multi-view algorithms with pre-trained 2D models.
Building upon this representation, we introduce a novel surface rendering technique that preserves local information and is robust to deformation.
arXiv Detail & Related papers (2023-12-26T02:50:42Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - A Generative Framework for Self-Supervised Facial Representation Learning [18.094262972295702]
Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets.
Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light.
We propose LatentFace, a novel generative framework for self-supervised facial representations.
arXiv Detail & Related papers (2023-09-15T09:34:05Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions
and Action Unit Intensities? [1.2233362977312945]
This work focuses on a promising alternative based on parametric 3D face shape alignment models.
We benchmark four recent 3D face alignment models: ExpNet, 3DDFA-V2, DECA, and EMOCA.
We report that 3D face features were on par with 2D appearance features in AUs 4, 6, 10, 12, and 25, but not the entire set of AUs.
arXiv Detail & Related papers (2022-07-03T20:19:06Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - Disentangled Face Identity Representations for joint 3D Face Recognition
and Expression Neutralisation [20.854071758664297]
Given a 3D face, our approach not only extracts a disentangled identity representation but also generates a realistic 3D face with a neutral expression while predicting its identity.
The proposed network consists of three components; (1) a Graph Convolutional Autoencoder (GCA) to encode the 3D faces into latent representations, (2) a Generative Adversarial Network (GAN) that translates the latent representations into those of neutral faces, (3) and an identity recognition sub-network taking advantage of the neutralized latent representations for 3D face recognition.
arXiv Detail & Related papers (2021-04-20T22:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.