Related papers: Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers

Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers

URL: http://arxiv.org/abs/2207.11081v3
Date: Fri, 9 Jun 2023 09:12:18 GMT
Title: Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers
Authors: Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang
Abstract summary: We propose a novel FER model, called Poker Face Vision Transformer or PF-ViT, to separate and recognize the disturbance-agnostic emotion from a static facial image. PF-ViT generates its corresponding poker face without the need for paired images.
Score: 57.67586172996843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Representation learning and feature disentanglement have recently attracted much research interests in facial expression recognition. The ubiquitous ambiguity of emotion labels is detrimental to those methods based on conventional supervised representation learning. Meanwhile, directly learning the mapping from a facial expression image to an emotion label lacks explicit supervision signals of facial details. In this paper, we propose a novel FER model, called Poker Face Vision Transformer or PF-ViT, to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face without the need for paired images. Here, we regard an expressive face as the comprehensive result of a set of facial muscle movements on one's poker face (i.e., emotionless face), inspired by Facial Action Coding System. The proposed PF-ViT leverages vanilla Vision Transformers, and are firstly pre-trained as Masked Autoencoders on a large facial expression dataset without emotion labels, obtaining excellent representations. It mainly consists of five components: 1) an encoder mapping the facial expression to a complete representation, 2) a separator decomposing the representation into an emotional component and an orthogonal residue, 3) a generator that can reconstruct the expressive face and synthesize the poker face, 4) a discriminator distinguishing the fake face produced by the generator, trained adversarially with the encoder and generator, 5) a classification head recognizing the emotion. Quantitative and qualitative results demonstrate the effectiveness of our method, which trumps the state-of-the-art methods on four popular FER testing sets.

Related papers

Facial Landmark Visualization and Emotion Recognition Through Neural Networks [0.0]
Emotion recognition from facial images is a crucial task in human-computer interaction.<n>Previous studies have shown that facial images can be used to train deep learning models.<n>We propose facial landmark box plots, a visualization technique designed to identify outliers in facial datasets.
arXiv Detail & Related papers (2025-06-20T17:45:34Z)
Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels. We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations. We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z)
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations. We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z)
GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression. We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z)
SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild [3.4798852684389963]
We propose a self-supervised simple facial landmark encoding (SimFLE) method that can learn effective encoding of facial landmarks. We introduce novel FaceMAE module for this purpose. Experimental results on several FER-W benchmarks prove that the proposed SimFLE is superior in facial landmark localization.
arXiv Detail & Related papers (2023-03-14T06:30:55Z)
Interpretable Explainability in Facial Emotion Recognition and Gamification for Data Collection [0.0]
Training facial emotion recognition models requires large sets of data and costly annotation processes. We developed a gamified method of acquiring annotated facial emotion data without an explicit labeling effort by humans. We observed significant improvements in the facial emotion perception and expression skills of the players through repeated game play.
arXiv Detail & Related papers (2022-11-09T09:53:48Z)
PERI: Part Aware Emotion Recognition In The Wild [4.206175795966693]
This paper focuses on emotion recognition using visual features. We create part aware spatial (PAS) images by extracting key regions from the input image using a mask generated from both body pose and facial landmarks. We provide our results on the publicly available in the wild EMOTIC dataset.
arXiv Detail & Related papers (2022-10-18T20:01:40Z)
Learning Facial Representations from the Cycle-consistency of Face [23.23272327438177]
We introduce cycle-consistency in facial characteristics as free supervisory signal to learn facial representations from unlabeled facial images. The learning is realized by superimposing the facial motion cycle-consistency and identity cycle-consistency constraints. Our approach is competitive with those of existing methods, demonstrating the rich and unique information embedded in the disentangled representations.
arXiv Detail & Related papers (2021-08-07T11:30:35Z)
I Only Have Eyes for You: The Impact of Masks On Convolutional-Based Facial Expression Recognition [78.07239208222599]
We evaluate how the recently proposed FaceChannel adapts towards recognizing facial expressions from persons with masks. We also perform specific feature-level visualization to demonstrate how the inherent capabilities of the FaceChannel to learn and combine facial features change when in a constrained social interaction scenario.
arXiv Detail & Related papers (2021-04-16T20:03:30Z)
DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition [94.96686189033869]
We propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) DotFAN can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity.
arXiv Detail & Related papers (2020-02-23T08:16:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.