Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition
- URL: http://arxiv.org/abs/2510.09730v1
- Date: Fri, 10 Oct 2025 11:03:20 GMT
- Title: Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition
- Authors: Thi Bich Phuong Man, Luu Tu Nguyen, Vu Tram Anh Khuong, Thanh Ha Le, Thi Duyen Ngo,
- Abstract summary: Micro-expressions (MEs) are subtle, transient facial changes with very low intensity, almost imperceptible to the naked eye, yet they reveal a person genuine emotion.<n>This paper proposes a novel MER method with two main contributions.<n>First, we propose two complementary representations - Temporal-ranked dynamic image, which emphasizes temporal progression, and Motion-intensity dynamic image, which highlights subtle motions through a frame reordering mechanism incorporating motion intensity.<n>Second, we propose an Adaptive fusion network, which automatically learns to optimally integrate these two representations, thereby enhancing discriminative ME features while suppressing noise.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-expressions (MEs) are subtle, transient facial changes with very low intensity, almost imperceptible to the naked eye, yet they reveal a person genuine emotion. They are of great value in lie detection, behavioral analysis, and psychological assessment. This paper proposes a novel MER method with two main contributions. First, we propose two complementary representations - Temporal-ranked dynamic image, which emphasizes temporal progression, and Motion-intensity dynamic image, which highlights subtle motions through a frame reordering mechanism incorporating motion intensity. Second, we propose an Adaptive fusion network, which automatically learns to optimally integrate these two representations, thereby enhancing discriminative ME features while suppressing noise. Experiments on three benchmark datasets (CASME-II, SAMM and MMEW) demonstrate the superiority of the proposed method. Specifically, AFN achieves 93.95 Accuracy and 0.897 UF1 on CASME-II, setting a new state-of-the-art benchmark. On SAMM, the method attains 82.47 Accuracy and 0.665 UF1, demonstrating more balanced recognition across classes. On MMEW, the model achieves 76.00 Accuracy, further confirming its generalization ability. The obtained results show that both the input and the proposed architecture play important roles in improving the performance of MER. Moreover, they provide a solid foundation for further research and practical applications in the fields of affective computing, lie detection, and human-computer interaction.
Related papers
- Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation [0.0]
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second.<n>Deep learning has enabled significant advances in micro-expression recognition (MER), but its effectiveness is limited by the scarcity of annotated ME datasets.<n>This paper proposes a phase-aware temporal augmentation method based on dynamic image.
arXiv Detail & Related papers (2025-10-17T09:20:51Z) - DIANet: A Phase-Aware Dual-Stream Network for Micro-Expression Recognition via Dynamic Images [0.0]
Micro-expressions are brief, involuntary facial movements that typically last less than half a second and often reveal genuine emotions.<n>This paper proposes a novel dual-stream framework, DIANet, which leverages phase-aware dynamic images.<n>Experiments conducted on three benchmark MER datasets demonstrate that the proposed method consistently outperforms conventional single-phase DI-based approaches.
arXiv Detail & Related papers (2025-10-14T07:15:29Z) - FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition [0.0]
micro-expression recognition is challenging due to the difficulty of capturing subtle facial movements.<n>We introduce a comprehensive motion representation, which integrates motion dynamics from both micro-expression phases into a unified descriptor.<n>We then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules.
arXiv Detail & Related papers (2025-10-09T05:36:40Z) - MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning [75.76032840813828]
We propose MILR, a test-time method that jointly reasons over image and text in a unified latent vector space.<n>We instantiate MILR within the unified multimodal understanding and generation framework.<n>We evaluate MILR on GenEval, T2I-CompBench, and WISE, achieving state-of-the-art results on all benchmarks.
arXiv Detail & Related papers (2025-09-26T14:06:10Z) - Micro-Expression Recognition via Fine-Grained Dynamic Perception [64.26947471761916]
We develop a novel fine-grained dynamic perception (FDP) framework for facial micro-expression recognition (MER)<n>We rank frame-level features of a sequence of raw frames in chronological order, in which the rank process encodes the dynamic information of both ME appearances and motions.<n>Our method significantly outperforms the state-of-the-art MER methods, and works well for dynamic image construction.
arXiv Detail & Related papers (2025-09-07T11:13:50Z) - ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness [12.584801819076425]
Micro-expressions (MEs) are regarded as important indicators of an individual's intrinsic emotions, preferences, and tendencies.<n>Previous deep learning approaches commonly employ sliding-window classification networks.<n>This paper proposes two state space model-based architectures, namely ME-TST and ME-TST+.
arXiv Detail & Related papers (2025-08-11T15:28:32Z) - Predicting Satisfied User and Machine Ratio for Compressed Images: A Unified Approach [58.71009078356928]
We create a deep learning-based model to predict Satisfied User Ratio (SUR) and Satisfied Machine Ratio (SMR) of compressed images simultaneously.<n> Experimental results indicate that the proposed model significantly outperforms state-of-the-art SUR and SMR prediction methods.
arXiv Detail & Related papers (2024-12-23T11:09:30Z) - Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data [83.48170683672427]
We propose a unified dual-modal learning framework that integrates SFER data as a complementary resource for DFER.<n>S4D employs dual-modal self-supervised pre-training on facial images and videos using a shared Transformer (ViT) encoder-decoder architecture.<n>Experiments demonstrate that S4D achieves a deeper understanding of DFER, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-09-10T01:57:57Z) - Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition [21.675660978188617]
Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy.<n>A three-stream temporal-shift attention network based on self-knowledge distillation is proposed in this paper.
arXiv Detail & Related papers (2024-06-25T13:22:22Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - RED-PSM: Regularization by Denoising of Factorized Low Rank Models for Dynamic Imaging [6.527016551650136]
In dynamic tomography, only a single projection at a single view angle may be available at a time.
We propose an approach, RED-PSM, which combines for the first time two powerful techniques to address this challenging imaging problem.
arXiv Detail & Related papers (2023-04-07T05:29:59Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.