Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition
- URL: http://arxiv.org/abs/2001.06144v1
- Date: Fri, 17 Jan 2020 03:26:32 GMT
- Title: Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition
- Authors: Wenxuan Wang, Yanwei Fu, Qiang Sun, Tao Chen, Chenjie Cao, Ziqi Zheng,
Guoqiang Xu, Han Qiu, Yu-Gang Jiang, Xiangyang Xue
- Abstract summary: We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
- Score: 98.83578105374535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Affective computing and cognitive theory are widely used in modern
human-computer interaction scenarios. Human faces, as the most prominent and
easily accessible features, have attracted great attention from researchers.
Since humans have rich emotions and developed musculature, there exist a lot of
fine-grained expressions in real-world applications. However, it is extremely
time-consuming to collect and annotate a large number of facial images, of
which may even require psychologists to correctly categorize them. To the best
of our knowledge, the existing expression datasets are only limited to several
basic facial expressions, which are not sufficient to support our ambitions in
developing successful human-computer interaction systems. To this end, a novel
Fine-grained Facial Expression Database - F2ED is contributed in this paper,
and it includes more than 200k images with 54 facial expressions from 119
persons. Considering the phenomenon of uneven data distribution and lack of
samples is common in real-world scenarios, we further evaluate several tasks of
few-shot expression learning by virtue of our F2ED, which are to recognize the
facial expressions given only few training instances. These tasks mimic human
performance to learn robust and general representation from few examples. To
address such few-shot tasks, we propose a unified task-driven framework -
Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize
facial images and thus augmenting the instances of few-shot expression classes.
Extensive experiments are conducted on F2ED and existing facial expression
datasets, i.e., JAFFE and FER2013, to validate the efficacy of our F2ED in
pre-training facial expression recognition network and the effectiveness of our
proposed approach Comp-GAN to improve the performance of few-shot recognition
tasks.
Related papers
- CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial
Expression Recognition [1.8604727699812171]
We propose a unified framework for both static and dynamic facial Expression Recognition based on CLIP.
We introduce multiple expression text descriptors (METD) to learn fine-grained expression representations that make CLIPER more interpretable.
arXiv Detail & Related papers (2023-03-01T02:59:55Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers [57.1091606948826]
We propose a novel FER model, named Poker Face Vision Transformer or PF-ViT, to address these challenges.
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face.
PF-ViT utilizes vanilla Vision Transformers, and its components are pre-trained as Masked Autoencoders on a large facial expression dataset.
arXiv Detail & Related papers (2022-07-22T13:39:06Z) - Towards a General Deep Feature Extractor for Facial Expression
Recognition [5.012963825796511]
We propose a new deep learning-based approach that learns a visual feature extractor general enough to be applied to any other facial emotion recognition task or dataset.
DeepFEVER outperforms state-of-the-art results on the AffectNet and Google Facial Expression Comparison datasets.
arXiv Detail & Related papers (2022-01-19T18:42:23Z) - Exploiting Emotional Dependencies with Graph Convolutional Networks for
Facial Expression Recognition [31.40575057347465]
This paper proposes a novel multi-task learning framework to recognize facial expressions in-the-wild.
A shared feature representation is learned for both discrete and continuous recognition in a MTL setting.
The results of our experiments show that our method outperforms the current state-of-the-art methods on discrete FER.
arXiv Detail & Related papers (2021-06-07T10:20:05Z) - Pre-training strategies and datasets for facial representation learning [58.8289362536262]
We show how to find a universal face representation that can be adapted to several facial analysis tasks and datasets.
We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training.
Our main two findings are: Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements.
arXiv Detail & Related papers (2021-03-30T17:57:25Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Real-time Facial Expression Recognition "In The Wild'' by Disentangling
3D Expression from Identity [6.974241731162878]
This paper proposes a novel method for human emotion recognition from a single RGB image.
We construct a large-scale dataset of facial videos, rich in facial dynamics, identities, expressions, appearance and 3D pose variations.
Our proposed framework runs at 50 frames per second and is capable of robustly estimating parameters of 3D expression variation.
arXiv Detail & Related papers (2020-05-12T01:32:55Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.