Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel
Attention for Facial Action Analysis
- URL: http://arxiv.org/abs/2203.12570v1
- Date: Wed, 23 Mar 2022 17:29:51 GMT
- Title: Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel
Attention for Facial Action Analysis
- Authors: Xiaotian Li, Zhihua Li, Huiyuan Yang, Geran Zhao and Lijun Yin
- Abstract summary: We propose a compact model to enhance the representational and focusing power of neural attention maps.
The proposed method is evaluated on two benchmark databases (BP4D and DISFA) for AU detection and four databases (CK+, MMI, BU-3DFE, and BP4D+) for facial expression recognition.
It achieves superior performance compared to the state-of-the-art methods.
- Score: 12.544285462327839
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual attention has been extensively studied for learning fine-grained
features in both facial expression recognition (FER) and Action Unit (AU)
detection. A broad range of previous research has explored how to use attention
modules to localize detailed facial parts (e,g. facial action units), learn
discriminative features, and learn inter-class correlation. However, few
related works pay attention to the robustness of the attention module itself.
Through experiments, we found neural attention maps initialized with different
feature maps yield diverse representations when learning to attend the
identical Region of Interest (ROI). In other words, similar to general feature
learning, the representational quality of attention maps also greatly affects
the performance of a model, which means unconstrained attention learning has
lots of randomnesses. This uncertainty lets conventional attention learning
fall into sub-optimal. In this paper, we propose a compact model to enhance the
representational and focusing power of neural attention maps and learn the
"inter-attention" correlation for refined attention maps, which we term the
"Self-Diversified Multi-Channel Attention Network (SMA-Net)". The proposed
method is evaluated on two benchmark databases (BP4D and DISFA) for AU
detection and four databases (CK+, MMI, BU-3DFE, and BP4D+) for facial
expression recognition. It achieves superior performance compared to the
state-of-the-art methods.
Related papers
- ResMatch: Residual Attention Learning for Local Feature Matching [51.07496081296863]
We rethink cross- and self-attention from the viewpoint of traditional feature matching and filtering.
We inject the similarity of descriptors and relative positions into cross- and self-attention score.
We mine intra- and inter-neighbors according to the similarity of descriptors and relative positions.
arXiv Detail & Related papers (2023-07-11T11:32:12Z) - Dual Cross-Attention Learning for Fine-Grained Visual Categorization and
Object Re-Identification [19.957957963417414]
We propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning.
First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions.
Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs.
arXiv Detail & Related papers (2022-05-04T16:14:26Z) - Distract Your Attention: Multi-head Cross Attention Network for Facial
Expression Recognition [4.500212131331687]
We present a novel facial expression recognition network, called Distract your Attention Network (DAN)
Our method is based on two key observations. Multiple classes share inherently similar underlying facial appearance, and their differences could be subtle.
We propose our DAN with three key components: Feature Clustering Network (FCN), Multi-head cross Attention Network (MAN), and Attention Fusion Network (AFN)
arXiv Detail & Related papers (2021-09-15T13:15:54Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - A Simple and Effective Self-Supervised Contrastive Learning Framework
for Aspect Detection [15.36713547251997]
We propose a self-supervised contrastive learning framework and an attention-based model equipped with a novel smooth self-attention (SSA) module for the UAD task.
Our methods outperform several recent unsupervised and weakly supervised approaches on publicly available benchmark user review datasets.
arXiv Detail & Related papers (2020-09-18T22:13:49Z) - Collaborative Attention Mechanism for Multi-View Action Recognition [75.33062629093054]
We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem.
The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other.
Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
arXiv Detail & Related papers (2020-09-14T17:33:10Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z) - Deep Attention Aware Feature Learning for Person Re-Identification [22.107332426681072]
We propose to incorporate the attention learning as additional objectives in a person ReID network without changing the original structure.
We have tested its performance on two typical networks (TriNet and Bag of Tricks) and observed significant performance improvement on five widely used datasets.
arXiv Detail & Related papers (2020-03-01T16:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.