Video-based Facial Expression Recognition using Graph Convolutional
Networks
- URL: http://arxiv.org/abs/2010.13386v1
- Date: Mon, 26 Oct 2020 07:31:51 GMT
- Title: Video-based Facial Expression Recognition using Graph Convolutional
Networks
- Authors: Daizong Liu, Hongting Zhang, Pan Zhou
- Abstract summary: We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
- Score: 57.980827038988735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial expression recognition (FER), aiming to classify the expression
present in the facial image or video, has attracted a lot of research interests
in the field of artificial intelligence and multimedia. In terms of video based
FER task, it is sensible to capture the dynamic expression variation among the
frames to recognize facial expression. However, existing methods directly
utilize CNN-RNN or 3D CNN to extract the spatial-temporal features from
different facial units, instead of concentrating on a certain region during
expression variation capturing, which leads to limited performance in FER. In
our paper, we introduce a Graph Convolutional Network (GCN) layer into a common
CNN-RNN based model for video-based FER. First, the GCN layer is utilized to
learn more significant facial expression features which concentrate on certain
regions after sharing information between extracted CNN features of nodes.
Then, a LSTM layer is applied to learn long-term dependencies among the GCN
learned features to model the variation. In addition, a weight assignment
mechanism is also designed to weight the output of different nodes for final
classification by characterizing the expression intensities in each frame. To
the best of our knowledge, it is the first time to use GCN in FER task. We
evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and
also one challenging wild dataset AFEW8.0, and the experimental results
demonstrate that our method has superior performance to existing methods.
Related papers
- MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition [4.512502015606517]
We propose a Multi-Scale-temporal CNN-Transformer network (MSSTNet)
Our approach takes spatial different scales extracted by CNN and feeds them into a Multi-scale Embedding Layer (MELayer)
The MELayer extracts multi-scale spatial information and encodes these features before sending them into a Transformer (T-Former)
arXiv Detail & Related papers (2024-04-12T12:30:48Z) - Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in
the Wild [19.5702895176141]
We propose a method for capturing discnative features within each frame model.
We utilize the CNN to translate each frame into a visual feature sequence.
Experiments indicate that our method provides an effective way to make use of the spatial and temporal dependencies.
arXiv Detail & Related papers (2022-05-10T08:47:15Z) - Multi-Branch Deep Radial Basis Function Networks for Facial Emotion
Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units.
RBF units capture local patterns shared by similar instances using an intermediate representation.
We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z) - Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional
Architectures in a Contextual Approach for Video-Based Visual Emotion
Recognition in the Wild [31.40575057347465]
We tackle the task of video-based visual emotion recognition in the wild.
Standard methodologies that rely solely on the extraction of bodily and facial features often fall short of accurate emotion prediction.
We aspire to alleviate this problem by leveraging visual context in the form of scene characteristics and attributes.
arXiv Detail & Related papers (2021-05-16T17:31:59Z) - Facial expression and attributes recognition based on multi-task
learning of lightweight neural networks [9.162936410696409]
We examine the multi-task training of lightweight convolutional neural networks for face identification and classification of facial attributes.
It is shown that it is still necessary to fine-tune these networks in order to predict facial expressions.
Several models are presented based on MobileNet, EfficientNet and RexNet architectures.
arXiv Detail & Related papers (2021-03-31T14:21:04Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.