ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness
- URL: http://arxiv.org/abs/2508.08082v1
- Date: Mon, 11 Aug 2025 15:28:32 GMT
- Title: ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness
- Authors: Zizheng Guo, Bochao Zou, Junbao Zhuo, Huimin Ma,
- Abstract summary: Micro-expressions (MEs) are regarded as important indicators of an individual's intrinsic emotions, preferences, and tendencies.<n>Previous deep learning approaches commonly employ sliding-window classification networks.<n>This paper proposes two state space model-based architectures, namely ME-TST and ME-TST+.
- Score: 12.584801819076425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-expressions (MEs) are regarded as important indicators of an individual's intrinsic emotions, preferences, and tendencies. ME analysis requires spotting of ME intervals within long video sequences and recognition of their corresponding emotional categories. Previous deep learning approaches commonly employ sliding-window classification networks. However, the use of fixed window lengths and hard classification presents notable limitations in practice. Furthermore, these methods typically treat ME spotting and recognition as two separate tasks, overlooking the essential relationship between them. To address these challenges, this paper proposes two state space model-based architectures, namely ME-TST and ME-TST+, which utilize temporal state transition mechanisms to replace conventional window-level classification with video-level regression. This enables a more precise characterization of the temporal dynamics of MEs and supports the modeling of MEs with varying durations. In ME-TST+, we further introduce multi-granularity ROI modeling and the slowfast Mamba framework to alleviate information loss associated with treating ME analysis as a time-series task. Additionally, we propose a synergy strategy for spotting and recognition at both the feature and result levels, leveraging their intrinsic connection to enhance overall analysis performance. Extensive experiments demonstrate that the proposed methods achieve state-of-the-art performance. The codes are available at https://github.com/zizheng-guo/ME-TST.
Related papers
- MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z) - FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z) - Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation [0.0]
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, typically lasting less than half a second.<n>Deep learning has enabled significant advances in micro-expression recognition (MER), but its effectiveness is limited by the scarcity of annotated ME datasets.<n>This paper proposes a phase-aware temporal augmentation method based on dynamic image.
arXiv Detail & Related papers (2025-10-17T09:20:51Z) - Boosting Micro-Expression Analysis via Prior-Guided Video-Level Regression [15.099304324307434]
Micro-expressions (MEs) are involuntary, low-intensity, and short-duration facial expressions.<n>Most existing ME analysis methods rely on window-level classification with fixed window sizes and hard decisions.<n>We propose a prior-guided video-level regression method for ME analysis.
arXiv Detail & Related papers (2025-08-26T09:13:36Z) - Spatio-Temporal Fuzzy-oriented Multi-Modal Meta-Learning for Fine-grained Emotion Recognition [26.73957526115721]
Fine-grained emotion recognition (FER) plays a vital role in various fields, such as disease diagnosis, personalized recommendations, and multimedia mining.<n>Existing FER methods face three key challenges in real-world applications: (i) they rely on large amounts of continuously annotated data to ensure accuracy since emotions are complex and ambiguous in reality, which is costly and time-consuming; (ii) they cannot capture the temporal heterogeneity caused by changing emotion patterns, because they usually assume that the temporal correlation within sampling periods is the same; (iii) they do not consider the spatial heterogeneity of different FER scenarios, that is, the distribution of emotion
arXiv Detail & Related papers (2024-12-18T06:40:53Z) - Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition [12.087992699513213]
The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals.
Previous deep learning methods have primarily relied on classification networks utilizing sliding windows.
We present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression.
arXiv Detail & Related papers (2024-09-15T12:14:19Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition [0.862468061241377]
We propose a novel framework termed Soft Contrastive Masked Modeling (SCMM) to tackle the challenge of cross-corpus EEG-based emotion recognition.<n>SCMM integrates soft contrastive learning with a hybrid masking strategy to effectively capture emotion dynamics.<n>In experiments, SCMM achieves an average accuracy of 4.26% under both same-class and different-class cross-corpus settings.
arXiv Detail & Related papers (2024-08-17T12:35:13Z) - Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition [21.675660978188617]
Micro-expression recognition is crucial in many fields, including criminal analysis and psychotherapy.<n>A three-stream temporal-shift attention network based on self-knowledge distillation is proposed in this paper.
arXiv Detail & Related papers (2024-06-25T13:22:22Z) - Graph-Aware Contrasting for Multivariate Time-Series Classification [50.84488941336865]
Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques.
We propose Graph-Aware Contrasting for spatial consistency across MTS data.
Our proposed method achieves state-of-the-art performance on various MTS classification tasks.
arXiv Detail & Related papers (2023-09-11T02:35:22Z) - GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation
and Multi-Scale Temporal Aggregation [2.0444600042188448]
Gait recognition is one of the most promising video-based biometric technologies.
We propose a novel gait recognition framework, denoted as GaitASMS.
It can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information.
arXiv Detail & Related papers (2023-07-29T13:03:17Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.