Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation
- URL: http://arxiv.org/abs/2510.15466v1
- Date: Fri, 17 Oct 2025 09:20:51 GMT
- Title: Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation
- Authors: Vu Tram Anh Khuong, Luu Tu Nguyen, Thanh Ha Le, Thi Duyen Ngo,
- Abstract summary: Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second.<n>Deep learning has enabled significant advances in micro-expression recognition (MER), but its effectiveness is limited by the scarcity of annotated ME datasets.<n>This paper proposes a phase-aware temporal augmentation method based on dynamic image.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second. Recognizing these subtle expressions is critical for applications in psychology, security, and behavioral analysis. Although deep learning has enabled significant advances in micro-expression recognition (MER), its effectiveness is limited by the scarcity of annotated ME datasets. This data limitation not only hinders generalization but also restricts the diversity of motion patterns captured during training. Existing MER studies predominantly rely on simple spatial augmentations (e.g., flipping, rotation) and overlook temporal augmentation strategies that can better exploit motion characteristics. To address this gap, this paper proposes a phase-aware temporal augmentation method based on dynamic image. Rather than encoding the entire expression as a single onset-to-offset dynamic image (DI), our approach decomposes each expression sequence into two motion phases: onset-to-apex and apex-to-offset. A separate DI is generated for each phase, forming a Dual-phase DI augmentation strategy. These phase-specific representations enrich motion diversity and introduce complementary temporal cues that are crucial for recognizing subtle facial transitions. Extensive experiments on CASME-II and SAMM datasets using six deep architectures, including CNNs, Vision Transformer, and the lightweight LEARNet, demonstrate consistent performance improvements in recognition accuracy, unweighted F1-score, and unweighted average recall, which are crucial for addressing class imbalance in MER. When combined with spatial augmentations, our method achieves up to a 10\% relative improvement. The proposed augmentation is simple, model-agnostic, and effective in low-resource settings, offering a promising direction for robust and generalizable MER.
Related papers
- A Novel Combined Optical Flow Approach for Comprehensive Micro-Expression Recognition [0.0]
This study introduces a Combined Optical Flow (COF), integrating both phases to enhance feature representation.<n> Experimental results on CASMEII and SAMM datasets show that COF outperforms single optical flow-based methods.
arXiv Detail & Related papers (2025-10-17T09:29:17Z) - DIANet: A Phase-Aware Dual-Stream Network for Micro-Expression Recognition via Dynamic Images [0.0]
Micro-expressions are brief, involuntary facial movements that typically last less than half a second and often reveal genuine emotions.<n>This paper proposes a novel dual-stream framework, DIANet, which leverages phase-aware dynamic images.<n>Experiments conducted on three benchmark MER datasets demonstrate that the proposed method consistently outperforms conventional single-phase DI-based approaches.
arXiv Detail & Related papers (2025-10-14T07:15:29Z) - Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition [0.0]
Micro-expressions (MEs) are subtle, transient facial changes with very low intensity, almost imperceptible to the naked eye, yet they reveal a person genuine emotion.<n>This paper proposes a novel MER method with two main contributions.<n>First, we propose two complementary representations - Temporal-ranked dynamic image, which emphasizes temporal progression, and Motion-intensity dynamic image, which highlights subtle motions through a frame reordering mechanism incorporating motion intensity.<n>Second, we propose an Adaptive fusion network, which automatically learns to optimally integrate these two representations, thereby enhancing discriminative ME features while suppressing noise.
arXiv Detail & Related papers (2025-10-10T11:03:20Z) - FMANet: A Novel Dual-Phase Optical Flow Approach with Fusion Motion Attention Network for Robust Micro-expression Recognition [0.0]
micro-expression recognition is challenging due to the difficulty of capturing subtle facial movements.<n>We introduce a comprehensive motion representation, which integrates motion dynamics from both micro-expression phases into a unified descriptor.<n>We then propose FMANet, a novel end-to-end neural network architecture that internalizes the dual-phase analysis and magnitude modulation into learnable modules.
arXiv Detail & Related papers (2025-10-09T05:36:40Z) - MPT: Motion Prompt Tuning for Micro-Expression Recognition [47.62949098749473]
This paper introduces Motion Prompt Tuning (MPT) as a novel approach to adapting pre-training models for micro-expression recognition (MER)<n>MPT represents a pioneering method for subtle motion prompt tuning. Particularly, we introduce motion prompt generation, including motion magnification and Gaussian tokenization, to extract subtle motions as prompts for LMs.<n>Extensive experiments conducted on three widely used MER datasets demonstrate that our proposed MPT consistently surpasses state-of-the-art approaches.
arXiv Detail & Related papers (2025-08-13T02:57:43Z) - Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning [106.68304931854038]
Reinforcement learning with verifiable rewards (RLVR) has been widely used for enhancing the reasoning abilities of large language models (LLMs)<n>We conduct a systematic empirical analysis of the entropy-performance exchange mechanism of RLVR across different levels of granularity.<n>Our analysis reveals that, in the rising stage, entropy reduction in negative samples facilitates the learning of effective reasoning patterns.<n>In the plateau stage, learning efficiency strongly correlates with high-entropy tokens present in low-perplexity samples and those located at the end of sequences.
arXiv Detail & Related papers (2025-08-04T10:08:10Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - IPSeg: Image Posterior Mitigates Semantic Drift in Class-Incremental Segmentation [77.06177202334398]
We identify two critical challenges in CISS that contribute to semantic drift and degrade performance.<n>First, we highlight the issue of separate optimization, where different parts of the model are optimized in distinct incremental stages.<n>Second, we identify noisy semantics arising from inappropriate pseudo-labeling, which results in sub-optimal results.
arXiv Detail & Related papers (2025-02-07T12:19:37Z) - Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data [83.48170683672427]
We propose a unified dual-modal learning framework that integrates SFER data as a complementary resource for DFER.<n>S4D employs dual-modal self-supervised pre-training on facial images and videos using a shared Transformer (ViT) encoder-decoder architecture.<n>Experiments demonstrate that S4D achieves a deeper understanding of DFER, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-09-10T01:57:57Z) - MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition [94.56755080185732]
We propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information.
Our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation.
arXiv Detail & Related papers (2024-05-31T08:06:05Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z) - SMA-STN: Segmented Movement-Attending Spatiotemporal Network
forMicro-Expression Recognition [20.166205708651194]
This paper proposes a segmented movement-attending network (SMA-STN) to reveal subtle movement changes visually in an efficient way.
Extensive experiments on three widely used benchmarks, i.e., CALoss II, SAMM, and SHIC, show that the proposed SMA-STN achieves better MER performance than other state-of-the-art methods.
arXiv Detail & Related papers (2020-10-19T09:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.