Distract Your Attention: Multi-head Cross Attention Network for Facial
Expression Recognition
- URL: http://arxiv.org/abs/2109.07270v6
- Date: Mon, 22 May 2023 03:19:49 GMT
- Title: Distract Your Attention: Multi-head Cross Attention Network for Facial
Expression Recognition
- Authors: Zhengyao Wen, Wenzhong Lin, Tao Wang, Ge Xu
- Abstract summary: We present a novel facial expression recognition network, called Distract your Attention Network (DAN)
Our method is based on two key observations. Multiple classes share inherently similar underlying facial appearance, and their differences could be subtle.
We propose our DAN with three key components: Feature Clustering Network (FCN), Multi-head cross Attention Network (MAN), and Attention Fusion Network (AFN)
- Score: 4.500212131331687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel facial expression recognition network, called Distract
your Attention Network (DAN). Our method is based on two key observations.
Firstly, multiple classes share inherently similar underlying facial
appearance, and their differences could be subtle. Secondly, facial expressions
exhibit themselves through multiple facial regions simultaneously, and the
recognition requires a holistic approach by encoding high-order interactions
among local features. To address these issues, we propose our DAN with three
key components: Feature Clustering Network (FCN), Multi-head cross Attention
Network (MAN), and Attention Fusion Network (AFN). The FCN extracts robust
features by adopting a large-margin learning objective to maximize class
separability. In addition, the MAN instantiates a number of attention heads to
simultaneously attend to multiple facial areas and build attention maps on
these regions. Further, the AFN distracts these attentions to multiple
locations before fusing the attention maps to a comprehensive one. Extensive
experiments on three public datasets (including AffectNet, RAF-DB, and SFEW
2.0) verified that the proposed method consistently achieves state-of-the-art
facial expression recognition performance. Code will be made available at
https://github.com/yaoing/DAN.
Related papers
- MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection [16.261362598190807]
The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images.
We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features.
We propose a novel Multi-level Graph Reasoning Network (termed MGRR-Net) for facial AU detection.
arXiv Detail & Related papers (2022-04-04T09:47:22Z) - Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel
Attention for Facial Action Analysis [12.544285462327839]
We propose a compact model to enhance the representational and focusing power of neural attention maps.
The proposed method is evaluated on two benchmark databases (BP4D and DISFA) for AU detection and four databases (CK+, MMI, BU-3DFE, and BP4D+) for facial expression recognition.
It achieves superior performance compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-23T17:29:51Z) - Visual Attention Network [90.0753726786985]
We propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention.
We also introduce a novel neural network based on LKA, namely Visual Attention Network (VAN)
VAN outperforms the state-of-the-art vision transformers and convolutional neural networks with a large margin in extensive experiments.
arXiv Detail & Related papers (2022-02-20T06:35:18Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns.
We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z) - Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture
Recognition [9.131161856493486]
We propose a novel end-to-end textbfRegional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN)
Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor.
The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.
arXiv Detail & Related papers (2021-01-17T10:14:28Z) - Robust Facial Landmark Detection by Cross-order Cross-semantic Deep
Network [58.843211405385205]
We propose a cross-order cross-semantic deep network (CCDN) to boost the semantic features learning for robust facial landmark detection.
Specifically, a cross-order two-squeeze multi-excitation (CTM) module is proposed to introduce the cross-order channel correlations for more discriminative representations learning.
A novel cross-order cross-semantic (COCS) regularizer is designed to drive the network to learn cross-order cross-semantic features from different activation for facial landmark detection.
arXiv Detail & Related papers (2020-11-16T08:19:26Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - Deep Attention Aware Feature Learning for Person Re-Identification [22.107332426681072]
We propose to incorporate the attention learning as additional objectives in a person ReID network without changing the original structure.
We have tested its performance on two typical networks (TriNet and Bag of Tricks) and observed significant performance improvement on five widely used datasets.
arXiv Detail & Related papers (2020-03-01T16:27:14Z) - Deep Multi-task Multi-label CNN for Effective Facial Attribute
Classification [53.58763562421771]
We propose a novel deep multi-task multi-label CNN, termed DMM-CNN, for effective Facial Attribute Classification (FAC)
Specifically, DMM-CNN jointly optimize two closely-related tasks (i.e., facial landmark detection and FAC) to improve the performance of FAC by taking advantage of multi-task learning.
Two different network architectures are respectively designed to extract features for two groups of attributes, and a novel dynamic weighting scheme is proposed to automatically assign the loss weight to each facial attribute during training.
arXiv Detail & Related papers (2020-02-10T12:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.