Related papers: FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition

FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition

URL: http://arxiv.org/abs/2510.07810v3
Date: Wed, 15 Oct 2025 14:28:10 GMT
Title: FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition
Authors: Luu Tu Nguyen, Vu Tram Anh Khuong, Thi Bich Phuong Man, Thi Duyen Ngo, Thanh Ha Le,
Abstract summary: micro-expression recognition is challenging due to the difficulty of capturing subtle facial movements.<n>We introduce a comprehensive motion representation, which integrates motion dynamics from both micro-expression phases into a unified descriptor.<n>We then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facial micro-expressions, characterized by their subtle and brief nature, are valuable indicators of genuine emotions. Despite their significance in psychology, security, and behavioral analysis, micro-expression recognition remains challenging due to the difficulty of capturing subtle facial movements. Optical flow has been widely employed as an input modality for this task due to its effectiveness. However, most existing methods compute optical flow only between the onset and apex frames, thereby overlooking essential motion information in the apex-to-offset phase. To address this limitation, we first introduce a comprehensive motion representation, termed Magnitude-Modulated Combined Optical Flow (MM-COF), which integrates motion dynamics from both micro-expression phases into a unified descriptor suitable for direct use in recognition networks. Building upon this principle, we then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules. This allows the network to adaptively fuse motion cues and focus on salient facial regions for classification. Experimental evaluations on the MMEW, SMIC, CASME-II, and SAMM datasets, widely recognized as standard benchmarks, demonstrate that our proposed MM-COF representation and FMANet outperforms existing methods, underscoring the potential of a learnable, dual-phase framework in advancing micro-expression recognition.

Related papers

A Novel Combined Optical Flow Approach for Comprehensive Micro-Expression Recognition [0.0]
This study introduces a Combined Optical Flow (COF), integrating both phases to enhance feature representation.<n> Experimental results on CASMEII and SAMM datasets show that COF outperforms single optical flow-based methods.
arXiv Detail & Related papers (2025-10-17T09:29:17Z)
Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation [0.0]
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second.<n>Deep learning has enabled significant advances in micro-expression recognition (MER), but its effectiveness is limited by the scarcity of annotated ME datasets.<n>This paper proposes a phase-aware temporal augmentation method based on dynamic image.
arXiv Detail & Related papers (2025-10-17T09:20:51Z)
DIANet: A Phase-Aware Dual-Stream Network for Micro-Expression Recognition via Dynamic Images [0.0]
Micro-expressions are brief, involuntary facial movements that typically last less than half a second and often reveal genuine emotions.<n>This paper proposes a novel dual-stream framework, DIANet, which leverages phase-aware dynamic images.<n>Experiments conducted on three benchmark MER datasets demonstrate that the proposed method consistently outperforms conventional single-phase DI-based approaches.
arXiv Detail & Related papers (2025-10-14T07:15:29Z)
MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution [46.600316142855334]
Facial micro-expression recognition (MER) is a challenging problem, due to transient and subtle micro-expression (ME) actions.<n>We propose an end-to-end micro-action-aware deep learning framework with advantages from transformer, graph convolution, and vanilla convolution.<n>Our framework outperforms the state-of-the-art MER methods on CASME II, SAMM, and SMIC benchmarks.
arXiv Detail & Related papers (2025-06-17T13:35:06Z)
MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception [53.00485107136624]
Micro-expressions (MEs) are brief and low-intensity facial movements revealing concealed emotions.<n>We propose a ME Large Language Model (MELLM) that integrates optical flow-based sensitivity to subtle facial motions.<n>MELLM achieves state-of-the-art accuracy and generalization across multiple ME benchmarks.
arXiv Detail & Related papers (2025-05-11T15:08:23Z)
Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition [10.741278852581646]
Emotional Mimicry Intensity (EMI) estimation plays a pivotal role in understanding human social behavior and advancing human-computer interaction.<n>This paper proposes a dual-stage cross-modal alignment framework to address the limitations of existing methods.<n> Experiments on the Hume-Vidmimic2 dataset demonstrate superior performance with an average Pearson coefficient correlation of 0.51 across six emotion dimensions.
arXiv Detail & Related papers (2025-03-13T17:46:16Z)
AHMSA-Net: Adaptive Hierarchical Multi-Scale Attention Network for Micro-Expression Recognition [15.008358563986825]
We design an Adaptive Hierarchical Multi-Scale Attention Network (AHMSA-Net) for micro-expression recognition.<n> AHMSA-Net consists of two parts: an adaptive hierarchical framework and a multi-scale attention mechanism.<n>Experiments demonstrate that AHMSA-Net achieves recognition accuracy of up to 78.21% on composite databases.
arXiv Detail & Related papers (2025-01-05T13:40:12Z)
Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition [21.675660978188617]
Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy.<n>A three-stream temporal-shift attention network based on self-knowledge distillation is proposed in this paper.
arXiv Detail & Related papers (2024-06-25T13:22:22Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet) AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z)
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations. Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived. We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z)
Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition. The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head. A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.