Multi-scale multi-modal micro-expression recognition algorithm based on
transformer
- URL: http://arxiv.org/abs/2301.02969v2
- Date: Wed, 11 Jan 2023 03:04:42 GMT
- Title: Multi-scale multi-modal micro-expression recognition algorithm based on
transformer
- Authors: Fengping Wang, Jie Li, Chun Qi, Lin Wang, Pan Wang
- Abstract summary: A micro-expression is a spontaneous unconscious facial muscle movement that can reveal the true emotions people attempt to hide.
We propose a multi-modal multi-scale algorithm based on transformer network to learn local multi-grained features of micro-expressions.
The results show the accuracy of the proposed algorithm in single measurement SMIC database is up to 78.73% and the F1 value on CASMEII of the combined database is up to 0.9071.
- Score: 17.980579727286518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A micro-expression is a spontaneous unconscious facial muscle movement that
can reveal the true emotions people attempt to hide. Although manual methods
have made good progress and deep learning is gaining prominence. Due to the
short duration of micro-expression and different scales of expressed in facial
regions, existing algorithms cannot extract multi-modal multi-scale facial
region features while taking into account contextual information to learn
underlying features. Therefore, in order to solve the above problems, a
multi-modal multi-scale algorithm based on transformer network is proposed in
this paper, aiming to fully learn local multi-grained features of
micro-expressions through two modal features of micro-expressions - motion
features and texture features. To obtain local area features of the face at
different scales, we learned patch features at different scales for both
modalities, and then fused multi-layer multi-headed attention weights to obtain
effective features by weighting the patch features, and combined cross-modal
contrastive learning for model optimization. We conducted comprehensive
experiments on three spontaneous datasets, and the results show the accuracy of
the proposed algorithm in single measurement SMIC database is up to 78.73% and
the F1 value on CASMEII of the combined database is up to 0.9071, which is at
the leading level.
Related papers
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting [11.978551396144532]
In this paper, we propose an efficient framework for facial expression spotting.
First, we propose a Sliding Window-based Multi-Resolution Optical flow (SW-MRO) feature, which calculates multi-resolution optical flow of the input sequence within compact sliding windows.
Second, we propose SpotFormer, a multi-scale-temporal Transformer that simultaneously encodes facial-temporal relationships of the SW-MRO features for accurate frame-level probability estimation.
Third, we introduce supervised contrastive learning into SpotFormer to enhance the discriminability between different types of expressions.
arXiv Detail & Related papers (2024-07-30T13:02:08Z) - Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition [21.675660978188617]
Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy.
A three-stream temporal-shift attention network based on self-knowledge distillation called SKD-TSTSAN is proposed in this paper.
arXiv Detail & Related papers (2024-06-25T13:22:22Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Micro-Expression Recognition Based on Attribute Information Embedding
and Cross-modal Contrastive Learning [22.525295392858293]
We propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning.
We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively.
arXiv Detail & Related papers (2022-05-29T12:28:10Z) - Video-based Facial Micro-Expression Analysis: A Survey of Datasets,
Features and Algorithms [52.58031087639394]
micro-expressions are involuntary and transient facial expressions.
They can provide important information in a broad range of applications such as lie detection, criminal detection, etc.
Since micro-expressions are transient and of low intensity, their detection and recognition is difficult and relies heavily on expert experiences.
arXiv Detail & Related papers (2022-01-30T05:14:13Z) - LMR-CBT: Learning Modality-fused Representations with CB-Transformer for
Multimodal Emotion Recognition from Unaligned Multimodal Sequences [5.570499497432848]
We propose an efficient neural network to learn modality-fused representations with CB-Transformer (LMR-CBT) for multimodal emotion recognition.
We conduct word-aligned and unaligned experiments on three challenging datasets.
arXiv Detail & Related papers (2021-12-03T03:43:18Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Micro-Facial Expression Recognition Based on Deep-Rooted Learning
Algorithm [0.0]
An effective Micro-Facial Expression Based Deep-Rooted Learning (MFEDRL) classifier is proposed in this paper.
The performance of the algorithm will be evaluated using recognition rate and false measures.
arXiv Detail & Related papers (2020-09-12T12:23:27Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.