Robust Facial Expression Recognition with Convolutional Visual
Transformers
- URL: http://arxiv.org/abs/2103.16854v1
- Date: Wed, 31 Mar 2021 07:07:56 GMT
- Title: Robust Facial Expression Recognition with Convolutional Visual
Transformers
- Authors: Fuyan Ma, Bin Sun and Shutao Li
- Abstract summary: We propose Convolutional Visual Transformers to tackle Facial Expression Recognition in the wild by two main steps.
First, we propose an attentional selective fusion (ASF) for leveraging the feature maps generated by two-branch CNNs.
Second, inspired by the success of Transformers in natural language processing, we propose to model relationships between these visual words with global self-attention.
- Score: 23.05378099875569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial Expression Recognition (FER) in the wild is extremely challenging due
to occlusions, variant head poses, face deformation and motion blur under
unconstrained conditions. Although substantial progresses have been made in
automatic FER in the past few decades, previous studies are mainly designed for
lab-controlled FER. Real-world occlusions, variant head poses and other issues
definitely increase the difficulty of FER on account of these
information-deficient regions and complex backgrounds. Different from previous
pure CNNs based methods, we argue that it is feasible and practical to
translate facial images into sequences of visual words and perform expression
recognition from a global perspective. Therefore, we propose Convolutional
Visual Transformers to tackle FER in the wild by two main steps. First, we
propose an attentional selective fusion (ASF) for leveraging the feature maps
generated by two-branch CNNs. The ASF captures discriminative information by
fusing multiple features with global-local attention. The fused feature maps
are then flattened and projected into sequences of visual words. Second,
inspired by the success of Transformers in natural language processing, we
propose to model relationships between these visual words with global
self-attention. The proposed method are evaluated on three public in-the-wild
facial expression datasets (RAF-DB, FERPlus and AffectNet). Under the same
settings, extensive experiments demonstrate that our method shows superior
performance over other methods, setting new state of the art on RAF-DB with
88.14%, FERPlus with 88.81% and AffectNet with 61.85%. We also conduct
cross-dataset evaluation on CK+ show the generalization capability of the
proposed method.
Related papers
- Bridging the Gaps: Utilizing Unlabeled Face Recognition Datasets to Boost Semi-Supervised Facial Expression Recognition [5.750927184237346]
We focus on utilizing large unlabeled Face Recognition (FR) datasets to boost semi-supervised FER.
Specifically, we first perform face reconstruction pre-training on large-scale facial images without annotations.
To further alleviate the scarcity of labeled and diverse images, we propose a Mixup-based data augmentation strategy.
arXiv Detail & Related papers (2024-10-23T07:26:19Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image [98.29284902879652]
We present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image.
It features disentangling the regression of local deformation fields and global mesh locations into two network branches.
It achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility.
arXiv Detail & Related papers (2024-06-26T00:08:29Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - More comprehensive facial inversion for more effective expression
recognition [8.102564078640274]
We propose a novel generative method based on the image inversion mechanism for the FER task, termed Inversion FER (IFER)
ASIT is equipped with an image inversion discriminator that measures the cosine similarity of semantic features between source and generated images, constrained by a distribution alignment loss.
We extensively evaluate ASIT on facial datasets such as FFHQ and CelebA-HQ, showing that our approach achieves state-of-the-art facial inversion performance.
arXiv Detail & Related papers (2022-11-24T12:31:46Z) - AU-Aware Vision Transformers for Biased Facial Expression Recognition [17.00557858587472]
We experimentally show that the naive joint training of multiple FER datasets is harmful to the FER performance of individual datasets.
We propose a simple yet conceptually-new framework, AU-aware Vision Transformer (AU-ViT)
Our AU-ViT achieves state-of-the-art performance on three popular datasets, namely 91.10% on RAF-DB, 65.59% on AffectNet, and 90.15% on FERPlus.
arXiv Detail & Related papers (2022-11-12T08:58:54Z) - Learning Diversified Feature Representations for Facial Expression
Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures.
Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z) - Self-supervised Contrastive Learning of Multi-view Facial Expressions [9.949781365631557]
Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems.
We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER.
arXiv Detail & Related papers (2021-08-15T11:23:34Z) - Learning Vision Transformer with Squeeze and Excitation for Facial
Expression Recognition [10.256620178727884]
We propose to learn a vision Transformer jointly with a Squeeze and Excitation (SE) block for FER task.
The proposed method is evaluated on different publicly available FER databases including CK+, JAFFE,RAF-DB and SFEW.
Experiments demonstrate that our model outperforms state-of-the-art methods on CK+ and SFEW.
arXiv Detail & Related papers (2021-07-07T09:49:01Z) - MViT: Mask Vision Transformer for Facial Expression Recognition in the
wild [77.44854719772702]
Facial Expression Recognition (FER) in the wild is an extremely challenging task in computer vision.
In this work, we first propose a novel pure transformer-based mask vision transformer (MViT) for FER in the wild.
Our MViT outperforms state-of-the-art methods on RAF-DB with 88.62%, FERPlus with 89.22%, and AffectNet-7 with 64.57%, respectively, and achieves a comparable result on AffectNet-8 with 61.40%.
arXiv Detail & Related papers (2021-06-08T16:58:10Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.