DrFER: Learning Disentangled Representations for 3D Facial Expression
Recognition
- URL: http://arxiv.org/abs/2403.08318v1
- Date: Wed, 13 Mar 2024 08:00:07 GMT
- Title: DrFER: Learning Disentangled Representations for 3D Facial Expression
Recognition
- Authors: Hebeizi Li, Hongyu Yang, Di Huang
- Abstract summary: We introduce the innovative DrFER method, which brings the concept of disentangled representation learning to the field of 3D FER.
DrFER employs a dual-branch framework to effectively disentangle expression information from identity information.
This adaptation enhances the capability of the framework in recognizing facial expressions, even in cases involving varying head poses.
- Score: 28.318304721838096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial Expression Recognition (FER) has consistently been a focal point in
the field of facial analysis. In the context of existing methodologies for 3D
FER or 2D+3D FER, the extraction of expression features often gets entangled
with identity information, compromising the distinctiveness of these features.
To tackle this challenge, we introduce the innovative DrFER method, which
brings the concept of disentangled representation learning to the field of 3D
FER. DrFER employs a dual-branch framework to effectively disentangle
expression information from identity information. Diverging from prior
disentanglement endeavors in the 3D facial domain, we have carefully
reconfigured both the loss functions and network structure to make the overall
framework adaptable to point cloud data. This adaptation enhances the
capability of the framework in recognizing facial expressions, even in cases
involving varying head poses. Extensive evaluations conducted on the BU-3DFE
and Bosphorus datasets substantiate that DrFER surpasses the performance of
other 3D FER methods.
Related papers
- Ig3D: Integrating 3D Face Representations in Facial Expression Inference [12.975434103690812]
This study aims to investigate the impacts of integrating 3D representations into the facial expression inference task.
We first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks.
We then explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks.
Our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks.
arXiv Detail & Related papers (2024-08-29T21:08:07Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - 3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior [62.80458034704989]
Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics.
Previous 3D face modeling methods face a challenge as they demand specific labels to effectively disentangle these factors.
This paper introduces a Weakly-Supervised Disentanglement Framework, denoted as WSDF, to facilitate the training of controllable 3D face models without an overly stringent labeling requirement.
arXiv Detail & Related papers (2024-04-25T11:50:47Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models [79.65289816077629]
We present FitDiff, a diffusion-based 3D facial avatar generative model.
Our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image.
Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines.
arXiv Detail & Related papers (2023-12-07T17:35:49Z) - DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation [6.0920148653974255]
We introduce Defect Injection (SDi) to augment the representational diversity of challenging indistinct-boundary objects within training corpora.
Consequently, we propose the Dual-Encoder Fourier Group Harmonics Network (DEFN) to tailor incorporating noise, amplify detailed feature recognition, and bolster representation across diverse medical imaging scenarios.
arXiv Detail & Related papers (2023-11-01T12:33:04Z) - 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch
Feature Swapping for Bodies and Faces [12.114711258010367]
We propose a self-supervised approach to train a 3D shape variational autoencoder which encourages a disentangled latent representation of identity features.
Experimental results conducted on 3D meshes show that state-of-the-art methods for latent disentanglement are not able to disentangle identity features of faces and bodies.
arXiv Detail & Related papers (2021-11-24T11:53:33Z) - Independent Sign Language Recognition with 3D Body, Hands, and Face
Reconstruction [46.70761714133466]
Independent Sign Language Recognition is a complex visual recognition problem that combines several challenging tasks of Computer Vision.
No work has adequately combined all three information channels to efficiently recognize Sign Language.
We employ SMPL-X, a contemporary parametric model that enables joint extraction of 3D body shape, face and hands information from a single image.
arXiv Detail & Related papers (2020-11-24T23:50:26Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z) - 3D Face Anti-spoofing with Factorized Bilinear Coding [35.30886962572515]
We propose a novel anti-spoofing method from the perspective of fine-grained classification.
By extracting discriminative and fusing complementary information from RGB and YCbCr spaces, we have developed a principled solution to 3D face spoofing detection.
arXiv Detail & Related papers (2020-05-12T03:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.