Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers
- URL: http://arxiv.org/abs/2207.11081v4
- Date: Thu, 10 Oct 2024 02:47:17 GMT
- Title: Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers
- Authors: Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang,
- Abstract summary: We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
- Score: 57.1091606948826
- License:
- Abstract: Representation learning and feature disentanglement have garnered significant research interest in the field of facial expression recognition (FER). The inherent ambiguity of emotion labels poses challenges for conventional supervised representation learning methods. Moreover, directly learning the mapping from a facial expression image to an emotion label lacks explicit supervision signals for capturing fine-grained facial features. In this paper, we propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges. PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face, without the need for paired images. Inspired by the Facial Action Coding System, we regard an expressive face as the combined result of a set of facial muscle movements on one's poker face (i.e., an emotionless face). PF-ViT utilizes vanilla Vision Transformers, and its components are firstly pre-trained as Masked Autoencoders on a large facial expression dataset without emotion labels, yielding excellent representations. Subsequently, we train PF-ViT using a GAN framework. During training, the auxiliary task of poke face generation promotes the disentanglement between emotional and emotion-irrelevant components, guiding the FER model to holistically capture discriminative facial details. Quantitative and qualitative results demonstrate the effectiveness of our method, surpassing the state-of-the-art methods on four popular FER datasets.
Related papers
- Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels.
We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations.
We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial
Expression Recognition in the Wild [3.4798852684389963]
We propose a self-supervised simple facial landmark encoding (SimFLE) method that can learn effective encoding of facial landmarks.
We introduce novel FaceMAE module for this purpose.
Experimental results on several FER-W benchmarks prove that the proposed SimFLE is superior in facial landmark localization.
arXiv Detail & Related papers (2023-03-14T06:30:55Z) - Interpretable Explainability in Facial Emotion Recognition and
Gamification for Data Collection [0.0]
Training facial emotion recognition models requires large sets of data and costly annotation processes.
We developed a gamified method of acquiring annotated facial emotion data without an explicit labeling effort by humans.
We observed significant improvements in the facial emotion perception and expression skills of the players through repeated game play.
arXiv Detail & Related papers (2022-11-09T09:53:48Z) - PERI: Part Aware Emotion Recognition In The Wild [4.206175795966693]
This paper focuses on emotion recognition using visual features.
We create part aware spatial (PAS) images by extracting key regions from the input image using a mask generated from both body pose and facial landmarks.
We provide our results on the publicly available in the wild EMOTIC dataset.
arXiv Detail & Related papers (2022-10-18T20:01:40Z) - Learning Facial Representations from the Cycle-consistency of Face [23.23272327438177]
We introduce cycle-consistency in facial characteristics as free supervisory signal to learn facial representations from unlabeled facial images.
The learning is realized by superimposing the facial motion cycle-consistency and identity cycle-consistency constraints.
Our approach is competitive with those of existing methods, demonstrating the rich and unique information embedded in the disentangled representations.
arXiv Detail & Related papers (2021-08-07T11:30:35Z) - I Only Have Eyes for You: The Impact of Masks On Convolutional-Based
Facial Expression Recognition [78.07239208222599]
We evaluate how the recently proposed FaceChannel adapts towards recognizing facial expressions from persons with masks.
We also perform specific feature-level visualization to demonstrate how the inherent capabilities of the FaceChannel to learn and combine facial features change when in a constrained social interaction scenario.
arXiv Detail & Related papers (2021-04-16T20:03:30Z) - DotFAN: A Domain-transferred Face Augmentation Network for Pose and
Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN)
DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains.
Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.