Boosting Micro-Expression Analysis via Prior-Guided Video-Level Regression
- URL: http://arxiv.org/abs/2508.18834v1
- Date: Tue, 26 Aug 2025 09:13:36 GMT
- Title: Boosting Micro-Expression Analysis via Prior-Guided Video-Level Regression
- Authors: Zizheng Guo, Bochao Zou, Yinuo Jia, Xiangyu Li, Huimin Ma,
- Abstract summary: Micro-expressions (MEs) are involuntary, low-intensity, and short-duration facial expressions.<n>Most existing ME analysis methods rely on window-level classification with fixed window sizes and hard decisions.<n>We propose a prior-guided video-level regression method for ME analysis.
- Score: 15.099304324307434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-expressions (MEs) are involuntary, low-intensity, and short-duration facial expressions that often reveal an individual's genuine thoughts and emotions. Most existing ME analysis methods rely on window-level classification with fixed window sizes and hard decisions, which limits their ability to capture the complex temporal dynamics of MEs. Although recent approaches have adopted video-level regression frameworks to address some of these challenges, interval decoding still depends on manually predefined, window-based methods, leaving the issue only partially mitigated. In this paper, we propose a prior-guided video-level regression method for ME analysis. We introduce a scalable interval selection strategy that comprehensively considers the temporal evolution, duration, and class distribution characteristics of MEs, enabling precise spotting of the onset, apex, and offset phases. In addition, we introduce a synergistic optimization framework, in which the spotting and recognition tasks share parameters except for the classification heads. This fully exploits complementary information, makes more efficient use of limited data, and enhances the model's capability. Extensive experiments on multiple benchmark datasets demonstrate the state-of-the-art performance of our method, with an STRS of 0.0562 on CAS(ME)$^3$ and 0.2000 on SAMMLV. The code is available at https://github.com/zizheng-guo/BoostingVRME.
Related papers
- Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection [52.5174167737992]
Video anomaly detection (VAD) aims to identify abnormal events in videos.<n>We propose SteerVAD, which advances MLLM-based VAD by shifting from passively reading to actively steering and rectifying internal representations.<n>Our method achieves state-of-the-art performance among tuning-free approaches requiring only 1% of training data.
arXiv Detail & Related papers (2026-02-27T13:48:50Z) - Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation [50.22481337087162]
Referring Video Object (RVOS) aims to segment objects in videos based on textual queries.<n>Refer-Agent is a collaborative multi-agent system with alternating reasoning-reflection mechanisms.
arXiv Detail & Related papers (2026-02-03T14:48:12Z) - SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z) - What Works for 'Lost-in-the-Middle' in LLMs? A Study on GM-Extract and Mitigations [1.2879523047871226]
GM-Extract is a novel benchmark dataset meticulously designed to evaluate LLM performance on retrieval of control variables.<n>We conduct a systematic evaluation of 7-8B parameter models on two multi-document tasks (key-value extraction and question-answering)<n>While a distinct U-shaped curve was not consistently observed, our analysis reveals a clear pattern of performance across models.
arXiv Detail & Related papers (2025-11-17T20:50:50Z) - LLM Unlearning Under the Microscope: A Full-Stack View on Methods and Metrics [10.638045151201084]
We present a principled taxonomy of twelve recent stateful unlearning methods.<n>We revisit the evaluation of unlearning effectiveness (UE), utility retention (UT), and robustness (Rob)<n>Our analysis shows that current evaluations, dominated by multiple-choice question (MCQ) accuracy, offer only a narrow perspective.
arXiv Detail & Related papers (2025-10-08T23:47:05Z) - Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs [54.70676039314542]
We present the first systematic study on quantizing diffusion-based language models.<n>We identify the presence of activation outliers, characterized by abnormally large activation values.<n>We implement state-of-the-art PTQ methods and conduct a comprehensive evaluation across multiple task types and model variants.
arXiv Detail & Related papers (2025-08-20T17:59:51Z) - ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness [12.584801819076425]
Micro-expressions (MEs) are regarded as important indicators of an individual's intrinsic emotions, preferences, and tendencies.<n>Previous deep learning approaches commonly employ sliding-window classification networks.<n>This paper proposes two state space model-based architectures, namely ME-TST and ME-TST+.
arXiv Detail & Related papers (2025-08-11T15:28:32Z) - MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation [0.0]
In practice, analyses are often complicated by missing data.<n>We propose MIBoost, a novel algorithm that employs a uniform variable-selection mechanism across imputed datasets.
arXiv Detail & Related papers (2025-07-29T13:42:38Z) - Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition [12.087992699513213]
The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals.
Previous deep learning methods have primarily relied on classification networks utilizing sliding windows.
We present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression.
arXiv Detail & Related papers (2024-09-15T12:14:19Z) - On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? [13.803180972839213]
We introduce a robust MeanShift for Test-time Augmentation (MTA)
MTA surpasses prompt-based methods without requiring this intensive training procedure.
We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:02Z) - EMO: Episodic Memory Optimization for Few-Shot Meta-Learning [69.50380510879697]
episodic memory optimization for meta-learning, we call EMO, is inspired by the human ability to recall past learning experiences from the brain's memory.
EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative.
EMO scales well with most few-shot classification benchmarks and improves the performance of optimization-based meta-learning methods.
arXiv Detail & Related papers (2023-06-08T13:39:08Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.