Intensity-Aware Loss for Dynamic Facial Expression Recognition in the
Wild
- URL: http://arxiv.org/abs/2208.10335v1
- Date: Fri, 19 Aug 2022 12:48:07 GMT
- Title: Intensity-Aware Loss for Dynamic Facial Expression Recognition in the
Wild
- Authors: Hanting Li, Hongjing Niu, Zhaoqing Zhu, and Feng Zhao
- Abstract summary: Video sequences often contain frames with different expression intensities, especially for the facial expressions in the real-world scenarios.
We propose the global convolution-attention block (GCA) to rescale the channels of the feature maps.
In addition, we introduce the intensity-aware loss (IAL) in the training process to help the network distinguish the samples with relatively low expression intensities.
- Score: 1.8604727699812171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared with the image-based static facial expression recognition (SFER)
task, the dynamic facial expression recognition (DFER) task based on video
sequences is closer to the natural expression recognition scene. However, DFER
is often more challenging. One of the main reasons is that video sequences
often contain frames with different expression intensities, especially for the
facial expressions in the real-world scenarios, while the images in SFER
frequently present uniform and high expression intensities. However, if the
expressions with different intensities are treated equally, the features
learned by the networks will have large intra-class and small inter-class
differences, which is harmful to DFER. To tackle this problem, we propose the
global convolution-attention block (GCA) to rescale the channels of the feature
maps. In addition, we introduce the intensity-aware loss (IAL) in the training
process to help the network distinguish the samples with relatively low
expression intensities. Experiments on two in-the-wild dynamic facial
expression datasets (i.e., DFEW and FERV39k) indicate that our method
outperforms the state-of-the-art DFER approaches. The source code will be made
publicly available.
Related papers
- UniLearn: Enhancing Dynamic Facial Expression Recognition through Unified Pre-Training and Fine-Tuning on Images and Videos [83.48170683672427]
UniLearn is a unified learning paradigm that integrates static facial expression recognition data to enhance DFER task.
UniLearn consistently state-of-the-art performance on FERV39K, MAFW, and DFEW benchmarks, with weighted average recall (WAR) of 53.65%, 58.44%, and 76.68%, respectively.
arXiv Detail & Related papers (2024-09-10T01:57:57Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial
Expression Recognition [1.8604727699812171]
We propose a unified framework for both static and dynamic facial Expression Recognition based on CLIP.
We introduce multiple expression text descriptors (METD) to learn fine-grained expression representations that make CLIPER more interpretable.
arXiv Detail & Related papers (2023-03-01T02:59:55Z) - NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression
Recognition [1.8604727699812171]
We propose a noise-robust dynamic facial expression recognition network (NR-DFERNet) to reduce the interference of noisy frames on the DFER task.
Specifically, at the spatial stage, we devise a dynamic-static fusion module (DSF) that introduces dynamic features to static features for learning more discriminative spatial features.
To suppress the impact of target irrelevant frames, we introduce a novel dynamic class token (DCT) for the transformer at the temporal stage.
arXiv Detail & Related papers (2022-06-10T10:17:30Z) - Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in
the Wild [19.5702895176141]
We propose a method for capturing discnative features within each frame model.
We utilize the CNN to translate each frame into a visual feature sequence.
Experiments indicate that our method provides an effective way to make use of the spatial and temporal dependencies.
arXiv Detail & Related papers (2022-05-10T08:47:15Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.