DIANet: A Phase-Aware Dual-Stream Network for Micro-Expression Recognition via Dynamic Images
- URL: http://arxiv.org/abs/2510.12219v1
- Date: Tue, 14 Oct 2025 07:15:29 GMT
- Title: DIANet: A Phase-Aware Dual-Stream Network for Micro-Expression Recognition via Dynamic Images
- Authors: Vu Tram Anh Khuong, Luu Tu Nguyen, Thi Bich Phuong Man, Thanh Ha Le, Thi Duyen Ngo,
- Abstract summary: Micro-expressions are brief, involuntary facial movements that typically last less than half a second and often reveal genuine emotions.<n>This paper proposes a novel dual-stream framework, DIANet, which leverages phase-aware dynamic images.<n>Experiments conducted on three benchmark MER datasets demonstrate that the proposed method consistently outperforms conventional single-phase DI-based approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Micro-expressions are brief, involuntary facial movements that typically last less than half a second and often reveal genuine emotions. Accurately recognizing these subtle expressions is critical for applications in psychology, security, and behavioral analysis. However, micro-expression recognition (MER) remains a challenging task due to the subtle and transient nature of facial cues and the limited availability of annotated data. While dynamic image (DI) representations have been introduced to summarize temporal motion into a single frame, conventional DI-based methods often overlook the distinct characteristics of different temporal phases within a micro-expression. To address this issue, this paper proposes a novel dual-stream framework, DIANet, which leverages phase-aware dynamic images - one encoding the onset-to-apex phase and the other capturing the apex-to-offset phase. Each stream is processed by a dedicated convolutional neural network, and a cross-attention fusion module is employed to adaptively integrate features from both streams based on their contextual relevance. Extensive experiments conducted on three benchmark MER datasets (CASME-II, SAMM, and MMEW) demonstrate that the proposed method consistently outperforms conventional single-phase DI-based approaches. The results highlight the importance of modeling temporal phase information explicitly and suggest a promising direction for advancing MER.
Related papers
- Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation [0.0]
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second.<n>Deep learning has enabled significant advances in micro-expression recognition (MER), but its effectiveness is limited by the scarcity of annotated ME datasets.<n>This paper proposes a phase-aware temporal augmentation method based on dynamic image.
arXiv Detail & Related papers (2025-10-17T09:20:51Z) - FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition [0.0]
micro-expression recognition is challenging due to the difficulty of capturing subtle facial movements.<n>We introduce a comprehensive motion representation, which integrates motion dynamics from both micro-expression phases into a unified descriptor.<n>We then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules.
arXiv Detail & Related papers (2025-10-09T05:36:40Z) - Temporal and Spatial Feature Fusion Framework for Dynamic Micro Expression Recognition [5.444324424467006]
Transient and highly localised micro-expressions pose a significant challenge to their accurate recognition.<n>The accuracy rate of micro-expression recognition is as low as 50%, even for professionals.<n>We propose a novel Temporal and Spatial feature Fusion framework for DMER (TSFmicro)
arXiv Detail & Related papers (2025-05-22T08:26:19Z) - Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition [21.675660978188617]
Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy.<n>A three-stream temporal-shift attention network based on self-knowledge distillation is proposed in this paper.
arXiv Detail & Related papers (2024-06-25T13:22:22Z) - Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition [48.21696443824074]
We propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN)
Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level.
arXiv Detail & Related papers (2024-06-13T10:57:24Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - Group Gated Fusion on Attention-based Bidirectional Alignment for
Multimodal Emotion Recognition [63.07844685982738]
This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states.
We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly.
The proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
arXiv Detail & Related papers (2022-01-17T09:46:59Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.