Feature Representation Learning with Adaptive Displacement Generation
and Transformer Fusion for Micro-Expression Recognition
- URL: http://arxiv.org/abs/2304.04420v1
- Date: Mon, 10 Apr 2023 07:03:36 GMT
- Title: Feature Representation Learning with Adaptive Displacement Generation
and Transformer Fusion for Micro-Expression Recognition
- Authors: Zhijun Zhai, Jianhui Zhao, Chengjiang Long, Wenju Xu, Shuangjiang He,
Huijuan Zhao
- Abstract summary: Micro-expressions are spontaneous, rapid and subtle facial movements that can neither be forged nor suppressed.
We propose a novel framework Feature Representation Learning with adaptive Displacement Generation and Transformer fusion (FRL-DGT)
Experiments with solid leave-one-subject-out (LOSO) evaluation results have demonstrated the superiority of our proposed FRL-DGT to state-of-the-art methods.
- Score: 18.6490971645882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-expressions are spontaneous, rapid and subtle facial movements that can
neither be forged nor suppressed. They are very important nonverbal
communication clues, but are transient and of low intensity thus difficult to
recognize. Recently deep learning based methods have been developed for
micro-expression (ME) recognition using feature extraction and fusion
techniques, however, targeted feature learning and efficient feature fusion
still lack further study according to the ME characteristics. To address these
issues, we propose a novel framework Feature Representation Learning with
adaptive Displacement Generation and Transformer fusion (FRL-DGT), in which a
convolutional Displacement Generation Module (DGM) with self-supervised
learning is used to extract dynamic features from onset/apex frames targeted to
the subsequent ME recognition task, and a well-designed Transformer Fusion
mechanism composed of three Transformer-based fusion modules (local, global
fusions based on AU regions and full-face fusion) is applied to extract the
multi-level informative features after DGM for the final ME prediction. The
extensive experiments with solid leave-one-subject-out (LOSO) evaluation
results have demonstrated the superiority of our proposed FRL-DGT to
state-of-the-art methods.
Related papers
- LoFLAT: Local Feature Matching using Focused Linear Attention Transformer [36.53651224633837]
We propose the LoFLAT, a novel Local Feature matching using Focused Linear Attention Transformer.
Our LoFLAT consists of three main modules: the Feature Extraction Module, the Feature Transformer Module, and the Matching Module.
The proposed LoFLAT outperforms the LoFTR method in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2024-10-30T05:38:07Z) - SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection [18.090706979440334]
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors.
Current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network.
In this paper, we introduce an accurate and efficient object detection method named SeaDATE.
arXiv Detail & Related papers (2024-10-15T07:26:39Z) - Micro-Expression Recognition by Motion Feature Extraction based on Pre-training [6.015288149235598]
We propose a novel motion extraction strategy (MoExt) for the micro-expression recognition task.
In MoExt, shape features and texture features are first extracted separately from onset and apex frames, and then motion features related to MEs are extracted based on shape features of both frames.
The effectiveness of proposed method is validated on three commonly used datasets.
arXiv Detail & Related papers (2024-07-10T03:51:34Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Computation and Parameter Efficient Multi-Modal Fusion Transformer for
Cued Speech Recognition [48.84506301960988]
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people.
automatic CS recognition (ACSR) seeks to transcribe visual cues of speech into text.
arXiv Detail & Related papers (2024-01-31T05:20:29Z) - X Modality Assisting RGBT Object Tracking [36.614908357546035]
We propose a novel X Modality Assisting Network (X-Net) to shed light on the impact of the fusion paradigm.
To tackle the feature learning hurdles stemming from significant differences between RGB and thermal modalities, a plug-and-play pixel-level generation module (PGM) is proposed.
We also propose a feature-level interaction module (FIM) that incorporates a mixed feature interaction transformer and a spatial-dimensional feature translation strategy.
arXiv Detail & Related papers (2023-12-27T05:38:54Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.