Landmark Guidance Independent Spatio-channel Attention and Complementary
Context Information based Facial Expression Recognition
- URL: http://arxiv.org/abs/2007.10298v2
- Date: Sat, 25 Jul 2020 14:50:25 GMT
- Title: Landmark Guidance Independent Spatio-channel Attention and Complementary
Context Information based Facial Expression Recognition
- Authors: Darshan Gera and S Balasubramanian
- Abstract summary: Modern facial expression recognition (FER) architectures rely on external sources like landmark detectors for defining attention.
In this work, an end-to-end architecture for FER is proposed that obtains both local and global attention per channel per spatial location.
robustness and superior performance of the proposed model is demonstrated on both in-lab and in-the-wild datasets.
- Score: 5.076419064097734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent trend to recognize facial expressions in the real-world scenario is
to deploy attention based convolutional neural networks (CNNs) locally to
signify the importance of facial regions and, combine it with global facial
features and/or other complementary context information for performance gain.
However, in the presence of occlusions and pose variations, different channels
respond differently, and further that the response intensity of a channel
differ across spatial locations. Also, modern facial expression
recognition(FER) architectures rely on external sources like landmark detectors
for defining attention. Failure of landmark detector will have a cascading
effect on FER. Additionally, there is no emphasis laid on the relevance of
features that are input to compute complementary context information.
Leveraging on the aforementioned observations, an end-to-end architecture for
FER is proposed in this work that obtains both local and global attention per
channel per spatial location through a novel spatio-channel attention net
(SCAN), without seeking any information from the landmark detectors. SCAN is
complemented by a complementary context information (CCI) branch. Further,
using efficient channel attention (ECA), the relevance of features input to CCI
is also attended to. The representation learnt by the proposed architecture is
robust to occlusions and pose variations. Robustness and superior performance
of the proposed model is demonstrated on both in-lab and in-the-wild datasets
(AffectNet, FERPlus, RAF-DB, FED-RO, SFEW, CK+, Oulu-CASIA and JAFFE) along
with a couple of constructed face mask datasets resembling masked faces in
COVID-19 scenario. Codes are publicly available at
https://github.com/1980x/SCAN-CCI-FER
Related papers
- Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition [55.97779732051921]
State-of-the-art classifiers for facial expression recognition (FER) lack interpretability, an important feature for end-users.
A new learning strategy is proposed to explicitly incorporate AU cues into classifier training, allowing to train deep interpretable models.
Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time.
arXiv Detail & Related papers (2024-10-01T10:42:55Z) - LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial
Expression Recognition [19.5702895176141]
Previous methods for facial expression recognition (DFER) in the wild are mainly based on Convolutional Neural Networks (CNNs), whose local operations ignore the long-range dependencies in videos.
We propose Transformer-based methods for DFER to achieve better performances but result in higher FLOPs and computational costs.
Experiments on two in-the-wild dynamic facial expression datasets (i.e., DFEW and FERV39K) indicate that our method provides an effective way to make use of the spatial and temporal dependencies for DFER.
arXiv Detail & Related papers (2023-05-05T07:53:13Z) - PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person
Search [56.02761592710612]
We propose a novel attention-aware relation mixer (ARM) for module person search.
Our ARM module is native and does not rely on fine-grained supervision or topological assumptions.
Our PS-ARM achieves state-of-the-art performance on both datasets.
arXiv Detail & Related papers (2022-10-07T10:04:12Z) - Local-Aware Global Attention Network for Person Re-Identification Based on Body and Hand Images [0.0]
We propose a compound approach for end-to-end discriminative deep feature learning for person Re-Id based on both body and hand images.
The proposed method consistently outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2022-09-11T09:43:42Z) - MGRR-Net: Multi-level Graph Relational Reasoning Network for Facial Action Units Detection [16.261362598190807]
The Facial Action Coding System (FACS) encodes the action units (AUs) in facial images.
We argue that encoding AU features just from one perspective may not capture the rich contextual information between regional and global face features.
We propose a novel Multi-level Graph Reasoning Network (termed MGRR-Net) for facial AU detection.
arXiv Detail & Related papers (2022-04-04T09:47:22Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - Affect Expression Behaviour Analysis in the Wild using Spatio-Channel
Attention and Complementary Context Information [5.076419064097734]
Facial expression recognition in the wild is crucial for building reliable human-computer interactive systems.
Current FER systems fail to perform well under various natural and un-controlled conditions.
This report presents attention based framework used in our submission to expression recognition track of the Affective Behaviour Analysis in-the-wild (ABAW) 2020 competition.
arXiv Detail & Related papers (2020-09-29T12:26:15Z) - Hierarchical Context Embedding for Region-based Object Detection [40.9463003508027]
Hierarchical Context Embedding (HCE) framework can be applied as a plug-and-play component.
To advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module.
Novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions.
arXiv Detail & Related papers (2020-08-04T05:33:22Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.