Recognizing Micro-Expression in Video Clip with Adaptive Key-Frame
Mining
- URL: http://arxiv.org/abs/2009.09179v3
- Date: Mon, 15 Mar 2021 07:53:54 GMT
- Title: Recognizing Micro-Expression in Video Clip with Adaptive Key-Frame
Mining
- Authors: Min Peng, Chongyang Wang, Yuan Gao, Tao Bi, Tong Chen, Yu Shi,
Xiang-Dong Zhou
- Abstract summary: In micro-expression, facial movement is transient and sparsely localized through time.
We propose a novel end-to-end deep learning architecture, referred to as adaptive key-frame mining network (AKMNet)
AKMNet is able to learn discriminative-temporal representation by combining spatial features of self-learned local key frames and their global-temporal dynamics.
- Score: 18.34213657996624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a spontaneous expression of emotion on face, micro-expression reveals the
underlying emotion that cannot be controlled by human. In micro-expression,
facial movement is transient and sparsely localized through time. However, the
existing representation based on various deep learning techniques learned from
a full video clip is usually redundant. In addition, methods utilizing the
single apex frame of each video clip require expert annotations and sacrifice
the temporal dynamics. To simultaneously localize and recognize such fleeting
facial movements, we propose a novel end-to-end deep learning architecture,
referred to as adaptive key-frame mining network (AKMNet). Operating on the
video clip of micro-expression, AKMNet is able to learn discriminative
spatio-temporal representation by combining spatial features of self-learned
local key frames and their global-temporal dynamics. Theoretical analysis and
empirical evaluation show that the proposed approach improved recognition
accuracy in comparison with state-of-the-art methods on multiple benchmark
datasets.
Related papers
- Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition [12.087992699513213]
The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals.
Previous deep learning methods have primarily relied on classification networks utilizing sliding windows.
We present a novel temporal state transition architecture grounded in the state space model, which replaces conventional window-level classification with video-level regression.
arXiv Detail & Related papers (2024-09-15T12:14:19Z) - MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues [0.0]
We propose a time-sensitive Multimodal Large Language Model (MLLM) aimed at directing attention to the local facial micro-expression dynamics.
Our model incorporates two key architectural contributions: (1) a global-local attention visual encoder that integrates global frame-level timestamp-bound image features with local facial features of temporal dynamics of micro-expressions; and (2) an utterance-aware video Q-Former that captures multi-scale and contextual dependencies by generating visual token sequences for each utterance segment and for the entire video then combining them.
arXiv Detail & Related papers (2024-07-23T15:05:55Z) - Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition [48.21696443824074]
We propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN)
Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level.
arXiv Detail & Related papers (2024-06-13T10:57:24Z) - Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding [112.3913646778859]
We propose a simple yet effective video-language modeling framework, S-ViLM.
It includes two novel designs, inter-clip spatial grounding and intra-clip temporal grouping, to promote learning region-object alignment and temporal-aware features.
S-ViLM surpasses the state-of-the-art methods substantially on four representative downstream tasks.
arXiv Detail & Related papers (2023-03-28T22:45:07Z) - Short and Long Range Relation Based Spatio-Temporal Transformer for
Micro-Expression Recognition [61.374467942519374]
We propose a novel a-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach for micro-expression recognition.
The architecture comprises a spatial encoder which learns spatial patterns, a temporal dimension classification for temporal analysis, and a head.
A comprehensive evaluation on three widely used spontaneous micro-expression data sets, shows that the proposed approach consistently outperforms the state of the art.
arXiv Detail & Related papers (2021-12-10T22:10:31Z) - Multi-Modal Interaction Graph Convolutional Network for Temporal
Language Localization in Videos [55.52369116870822]
This paper focuses on tackling the problem of temporal language localization in videos.
It aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video.
arXiv Detail & Related papers (2021-10-12T14:59:25Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.