Learning Vision Transformer with Squeeze and Excitation for Facial
Expression Recognition
- URL: http://arxiv.org/abs/2107.03107v2
- Date: Thu, 8 Jul 2021 10:37:00 GMT
- Title: Learning Vision Transformer with Squeeze and Excitation for Facial
Expression Recognition
- Authors: Mouath Aouayeb, Wassim Hamidouche, Catherine Soladie, Kidiyo Kpalma,
Renaud Seguier
- Abstract summary: We propose to learn a vision Transformer jointly with a Squeeze and Excitation (SE) block for FER task.
The proposed method is evaluated on different publicly available FER databases including CK+, JAFFE,RAF-DB and SFEW.
Experiments demonstrate that our model outperforms state-of-the-art methods on CK+ and SFEW.
- Score: 10.256620178727884
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As various databases of facial expressions have been made accessible over the
last few decades, the Facial Expression Recognition (FER) task has gotten a lot
of interest. The multiple sources of the available databases raised several
challenges for facial recognition task. These challenges are usually addressed
by Convolution Neural Network (CNN) architectures. Different from CNN models, a
Transformer model based on attention mechanism has been presented recently to
address vision tasks. One of the major issue with Transformers is the need of a
large data for training, while most FER databases are limited compared to other
vision applications. Therefore, we propose in this paper to learn a vision
Transformer jointly with a Squeeze and Excitation (SE) block for FER task. The
proposed method is evaluated on different publicly available FER databases
including CK+, JAFFE,RAF-DB and SFEW. Experiments demonstrate that our model
outperforms state-of-the-art methods on CK+ and SFEW and achieves competitive
results on JAFFE and RAF-DB.
Related papers
- Data Augmentation and Transfer Learning Approaches Applied to Facial
Expressions Recognition [0.3481985817302898]
We propose a novel data augmentation technique that improves the performances in the recognition task.
We build from scratch GAN models able to generate new synthetic images for each emotion type.
On the augmented datasets we fine tune pretrained convolutional neural networks with different architectures.
arXiv Detail & Related papers (2024-02-15T14:46:03Z) - POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression
Recognition [11.525573321175925]
Facial expression recognition (FER) is an important task in computer vision, having practical applications in areas such as human-computer interaction, education, healthcare, and online monitoring.
There are three key issues especially prevalent: inter-class similarity, intra-class discrepancy, and scale sensitivity.
We propose a two-stream Pyramid crOss-fuSion TransformER network (POSTER) that aims to holistically solve all three issues.
arXiv Detail & Related papers (2022-04-08T14:01:41Z) - Self-supervised Contrastive Learning of Multi-view Facial Expressions [9.949781365631557]
Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems.
We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER.
arXiv Detail & Related papers (2021-08-15T11:23:34Z) - Robust Facial Expression Recognition with Convolutional Visual
Transformers [23.05378099875569]
We propose Convolutional Visual Transformers to tackle Facial Expression Recognition in the wild by two main steps.
First, we propose an attentional selective fusion (ASF) for leveraging the feature maps generated by two-branch CNNs.
Second, inspired by the success of Transformers in natural language processing, we propose to model relationships between these visual words with global self-attention.
arXiv Detail & Related papers (2021-03-31T07:07:56Z) - Face Transformer for Recognition [67.02323570055894]
We investigate the performance of Transformer models in face recognition.
The models are trained on a large scale face recognition database MS-Celeb-1M.
We demonstrate that Transformer models achieve comparable performance as CNN with similar number of parameters and MACs.
arXiv Detail & Related papers (2021-03-27T03:53:29Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - VisBERT: Hidden-State Visualizations for Transformers [66.86452388524886]
We present VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering.
VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings.
arXiv Detail & Related papers (2020-11-09T15:37:43Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z) - DFEW: A Large-Scale Database for Recognizing Dynamic Facial Expressions
in the Wild [22.305429904593126]
We present a new large-scale 'in-the-wild' dynamic facial expression database, DFEW, consisting of over 16,000 video clips from thousands of movies.
Second, we propose a novel method called Expression-Clustered Spatiotemporal Feature Learning framework to deal with dynamic FER in the wild.
Third, we conduct extensive benchmark experiments on DFEW using a lot of deep feature learning methods as well as our proposed EC-STFL.
arXiv Detail & Related papers (2020-08-13T14:10:05Z) - The FaceChannel: A Light-weight Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic FER are based on very deep neural networks that are difficult to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how the FaceChannel achieves a comparable, if not better, performance, as compared to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-04-17T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.