Spatial and Temporal Networks for Facial Expression Recognition in the
Wild Videos
- URL: http://arxiv.org/abs/2107.05160v1
- Date: Mon, 12 Jul 2021 01:41:23 GMT
- Title: Spatial and Temporal Networks for Facial Expression Recognition in the
Wild Videos
- Authors: Shuyi Mao, Xinqi Fan, Xiaojiang Peng
- Abstract summary: The paper describes our proposed methodology for the seven basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2021.
Our ensemble model achieved F1 as 0.4133, accuracy as 0.6216 and final metric as 0.4821 on the validation set.
- Score: 14.760435737320744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper describes our proposed methodology for the seven basic expression
classification track of Affective Behavior Analysis in-the-wild (ABAW)
Competition 2021. In this task, facial expression recognition (FER) methods aim
to classify the correct expression category from a diverse background, but
there are several challenges. First, to adapt the model to in-the-wild
scenarios, we use the knowledge from pre-trained large-scale face recognition
data. Second, we propose an ensemble model with a convolution neural network
(CNN), a CNN-recurrent neural network (CNN-RNN), and a CNN-Transformer
(CNN-Transformer), to incorporate both spatial and temporal information. Our
ensemble model achieved F1 as 0.4133, accuracy as 0.6216 and final metric as
0.4821 on the validation set.
Related papers
- Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models [49.3179290313959]
The proposed method, emotion-centered generative replay (ECgr), tackles this challenge by integrating synthetic images from generative adversarial networks.
ECgr incorporates a quality assurance algorithm to ensure the fidelity of generated images.
The experimental results on four diverse facial expression datasets demonstrate that incorporating images generated by our pseudo-rehearsal method enhances training on the targeted dataset and the source dataset.
arXiv Detail & Related papers (2024-04-18T15:28:34Z) - Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition.
Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models.
Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z) - Text Classification in Memristor-based Spiking Neural Networks [0.0]
We develop a simulation framework with a virtual memristor array to demonstrate a sentiment analysis task in the IMDB movie reviews dataset.
We achieve the classification accuracy of 85.88% by converting a pre-trained ANN to a memristor-based SNN and 84.86% by training the memristor-based SNN directly.
We also investigate how global parameters such as spike train length, the read noise, and the weight updating stop conditions affect the neural networks in both approaches.
arXiv Detail & Related papers (2022-07-27T18:08:31Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Facial expression and attributes recognition based on multi-task
learning of lightweight neural networks [9.162936410696409]
We examine the multi-task training of lightweight convolutional neural networks for face identification and classification of facial attributes.
It is shown that it is still necessary to fine-tune these networks in order to predict facial expressions.
Several models are presented based on MobileNet, EfficientNet and RexNet architectures.
arXiv Detail & Related papers (2021-03-31T14:21:04Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - Deep Convolutional Neural Network Based Facial Expression Recognition in
the Wild [0.0]
We have used a proposed deep convolutional neural network (CNN) model to perform automatic facial expression recognition (AFER) on the given dataset.
Our proposed model has achieved an accuracy of 50.77% and an F1 score of 29.16% on the validation set.
arXiv Detail & Related papers (2020-10-03T08:17:00Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z) - Deep Learning based, end-to-end metaphor detection in Greek language
with Recurrent and Convolutional Neural Networks [0.0]
This paper presents and benchmarks a number of end-to-end Deep Learning based models for metaphor detection in Greek.
We combine Convolutional Neural Networks and Recurrent Neural Networks with representation learning to bear on the metaphor detection problem for the Greek language.
arXiv Detail & Related papers (2020-07-23T12:02:40Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.