MERANet: Facial Micro-Expression Recognition using 3D Residual Attention
Network
- URL: http://arxiv.org/abs/2012.04581v1
- Date: Mon, 7 Dec 2020 16:41:42 GMT
- Title: MERANet: Facial Micro-Expression Recognition using 3D Residual Attention
Network
- Authors: Viswanatha Reddy Gajjala, Sai Prasanna Teja Reddy, Snehasis Mukherjee,
Shiv Ram Dubey
- Abstract summary: We propose a facial-expression recognition model using 3D attention called MERANet.
The proposed model also encompasses both spatial and temporal information.
A superior performance is observed as compared to the state-of-the-art for facial micro-expression recognition.
- Score: 14.285700243381537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a facial micro-expression recognition model using 3D residual
attention network called MERANet. The proposed model takes advantage of
spatial-temporal attention and channel attention together, to learn deeper
fine-grained subtle features for classification of emotions. The proposed model
also encompasses both spatial and temporal information simultaneously using the
3D kernels and residual connections. Moreover, the channel features and
spatio-temporal features are re-calibrated using the channel and
spatio-temporal attentions, respectively in each residual module. The
experiments are conducted on benchmark facial micro-expression datasets. A
superior performance is observed as compared to the state-of-the-art for facial
micro-expression recognition.
Related papers
- Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition [21.675660978188617]
Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy.
A three-stream temporal-shift attention network based on self-knowledge distillation called SKD-TSTSAN is proposed in this paper.
arXiv Detail & Related papers (2024-06-25T13:22:22Z) - Micro-Expression Recognition Based on Attribute Information Embedding
and Cross-modal Contrastive Learning [22.525295392858293]
We propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning.
We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively.
arXiv Detail & Related papers (2022-05-29T12:28:10Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z) - Spatio-Temporal Self-Attention Network for Video Saliency Prediction [13.873682190242365]
3D convolutional neural networks have achieved promising results for video tasks in computer vision.
We propose a novel Spatio-Temporal Self-Temporal Self-Attention 3 Network (STSANet) for video saliency prediction.
arXiv Detail & Related papers (2021-08-24T12:52:47Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - GTA: Global Temporal Attention for Video Action Understanding [51.476605514802806]
We introduce Global Temporal Attention (AGT), which performs global temporal attention on top of spatial attention in a decoupled manner.
Tests on 2D and 3D networks demonstrate that our approach consistently enhances temporal modeling and provides state-of-the-art performance on three video action recognition datasets.
arXiv Detail & Related papers (2020-12-15T18:58:21Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.