MESTI-MEGANet: Micro-expression Spatio-Temporal Image and Micro-expression Gradient Attention Networks for Micro-expression Recognition
- URL: http://arxiv.org/abs/2509.00056v2
- Date: Sun, 07 Sep 2025 09:26:23 GMT
- Title: MESTI-MEGANet: Micro-expression Spatio-Temporal Image and Micro-expression Gradient Attention Networks for Micro-expression Recognition
- Authors: Luu Tu Nguyen, Vu Tram Anh Khuong, Thanh Ha Le, Thi Duyen Ngo,
- Abstract summary: Micro-expression recognition (MER) is a challenging task due to the subtle and fleeting nature of micro-expressions.<n>Traditional input modalities, such as Apex Frame, Optical Flow, and Dynamic Image, often fail to adequately capture these brief facial movements.<n>We introduce the Micro-expression Spatio-Temporal Image (MESTI), a novel dynamic input modality that transforms a video sequence into a single image.<n>We also present the Micro-expression Gradient Attention Network (MEGANet), which incorporates a novel Gradient Attention block to enhance the extraction of fine-grained motion features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-expression recognition (MER) is a challenging task due to the subtle and fleeting nature of micro-expressions. Traditional input modalities, such as Apex Frame, Optical Flow, and Dynamic Image, often fail to adequately capture these brief facial movements, resulting in suboptimal performance. In this study, we introduce the Micro-expression Spatio-Temporal Image (MESTI), a novel dynamic input modality that transforms a video sequence into a single image while preserving the essential characteristics of micro-movements. Additionally, we present the Micro-expression Gradient Attention Network (MEGANet), which incorporates a novel Gradient Attention block to enhance the extraction of fine-grained motion features from micro-expressions. By combining MESTI and MEGANet, we aim to establish a more effective approach to MER. Extensive experiments were conducted to evaluate the effectiveness of MESTI, comparing it with existing input modalities across three CNN architectures (VGG19, ResNet50, and EfficientNetB0). Moreover, we demonstrate that replacing the input of previously published MER networks with MESTI leads to consistent performance improvements. The performance of MEGANet, both with MESTI and Dynamic Image, is also evaluated, showing that our proposed network achieves state-of-the-art results on the CASMEII and SAMM datasets. The combination of MEGANet and MESTI achieves the highest accuracy reported to date, setting a new benchmark for micro-expression recognition. These findings underscore the potential of MESTI as a superior input modality and MEGANet as an advanced recognition network, paving the way for more effective MER systems in a variety of applications.
Related papers
- MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models [89.89575486159795]
We introduce textbfMICON-Bench, a benchmark for multi-image context generation.<n>We propose an MLLM-driven Evaluation-by-Checkpoint framework for automatic verification of semantic and visual consistency.<n>We also present textbfDynamic Attention Rebalancing (DAR), a training-free, plug-and-play mechanism that dynamically adjusts attention during inference to enhance coherence and reduce hallucinations.
arXiv Detail & Related papers (2026-02-23T04:32:52Z) - FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition [0.0]
micro-expression recognition is challenging due to the difficulty of capturing subtle facial movements.<n>We introduce a comprehensive motion representation, which integrates motion dynamics from both micro-expression phases into a unified descriptor.<n>We then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules.
arXiv Detail & Related papers (2025-10-09T05:36:40Z) - MPT: Motion Prompt Tuning for Micro-Expression Recognition [47.62949098749473]
This paper introduces Motion Prompt Tuning (MPT) as a novel approach to adapting pre-training models for micro-expression recognition (MER)<n>MPT represents a pioneering method for subtle motion prompt tuning. Particularly, we introduce motion prompt generation, including motion magnification and Gaussian tokenization, to extract subtle motions as prompts for LMs.<n>Extensive experiments conducted on three widely used MER datasets demonstrate that our proposed MPT consistently surpasses state-of-the-art approaches.
arXiv Detail & Related papers (2025-08-13T02:57:43Z) - Temporal and Spatial Feature Fusion Framework for Dynamic Micro Expression Recognition [5.444324424467006]
Transient and highly localised micro-expressions pose a significant challenge to their accurate recognition.<n>The accuracy rate of micro-expression recognition is as low as 50%, even for professionals.<n>We propose a novel Temporal and Spatial feature Fusion framework for DMER (TSFmicro)
arXiv Detail & Related papers (2025-05-22T08:26:19Z) - MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception [47.80768014770871]
We propose a novel Micro-Expression Large Language Model (MELLM)<n>It incorporates a subtle facial motion perception strategy with the strong inference capabilities of MLLMs.<n>Our model exhibits superior robustness and generalization capabilities in micro-expression understanding (MEU)
arXiv Detail & Related papers (2025-05-11T15:08:23Z) - AMMSM: Adaptive Motion Magnification and Sparse Mamba for Micro-Expression Recognition [7.084377962617903]
We propose a multi-task learning framework named the Adaptive Motion Magnification and Sparse Mamba.<n>This framework aims to enhance the accurate capture of micro-expressions through self-supervised subtle motion magnification.<n>We employ evolutionary search to optimize the magnification factor and the sparsity ratios of spatial selection, followed by fine-tuning to improve performance further.
arXiv Detail & Related papers (2025-03-31T13:17:43Z) - AHMSA-Net: Adaptive Hierarchical Multi-Scale Attention Network for Micro-Expression Recognition [15.008358563986825]
We design an Adaptive Hierarchical Multi-Scale Attention Network (AHMSA-Net) for micro-expression recognition.<n> AHMSA-Net consists of two parts: an adaptive hierarchical framework and a multi-scale attention mechanism.<n>Experiments demonstrate that AHMSA-Net achieves recognition accuracy of up to 78.21% on composite databases.
arXiv Detail & Related papers (2025-01-05T13:40:12Z) - Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition [48.21696443824074]
We propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN)
Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level.
arXiv Detail & Related papers (2024-06-13T10:57:24Z) - Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language
Pre-training [87.69394953339238]
Masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.
We propose a semantics-enhanced cross-modal MIM framework (SemMIM) for vision-language representation learning.
arXiv Detail & Related papers (2024-03-01T03:25:58Z) - Micro-Expression Recognition Based on Attribute Information Embedding
and Cross-modal Contrastive Learning [22.525295392858293]
We propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning.
We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively.
arXiv Detail & Related papers (2022-05-29T12:28:10Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.