Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
- URL: http://arxiv.org/abs/2403.05234v2
- Date: Mon, 3 Jun 2024 04:39:51 GMT
- Title: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
- Authors: Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang,
- Abstract summary: Micro-action is imperceptible non-verbal behaviour characterised by low-intensity movement.
In this study, we innovatively collect a new micro-action dataset designated as Micro-action-52 (MA-52)
Uniquely, MA-52 provides the whole-body perspective including gestures, upper- and lower-limb movements.
- Score: 26.090557725760934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity movement. It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment. However, the identification, differentiation, and understanding of micro-actions pose challenges due to the imperceptible and inaccessible nature of these subtle human behaviors in everyday life. In this study, we innovatively collect a new micro-action dataset designated as Micro-action-52 (MA-52), and propose a benchmark named micro-action network (MANet) for micro-action recognition (MAR) task. Uniquely, MA-52 provides the whole-body perspective including gestures, upper- and lower-limb movements, attempting to reveal comprehensive micro-action cues. In detail, MA-52 contains 52 micro-action categories along with seven body part labels, and encompasses a full array of realistic and natural micro-actions, accounting for 205 participants and 22,422 video instances collated from the psychological interviews. Based on the proposed dataset, we assess MANet and other nine prevalent action recognition methods. MANet incorporates squeeze-and excitation (SE) and temporal shift module (TSM) into the ResNet architecture for modeling the spatiotemporal characteristics of micro-actions. Then a joint-embedding loss is designed for semantic matching between video and action labels; the loss is used to better distinguish between visually similar yet distinct micro-action categories. The extended application in emotion recognition has demonstrated one of the important values of our proposed dataset and method. In the future, further exploration of human behaviour, emotion, and psychological assessment will be conducted in depth. The dataset and source code are released at https://github.com/VUT-HFUT/Micro-Action.
Related papers
- MMAD: Multi-label Micro-Action Detection in Videos [23.508563348306534]
We propose a new task named Multi-label Micro-Action Detection (MMAD)
MMAD involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them.
To support the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52), specifically designed to facilitate the detailed analysis and exploration of complex human micro-actions.
arXiv Detail & Related papers (2024-07-07T09:45:14Z) - Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
We introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories.
This dataset enables models to learn from varied scenarios and generalize to real-world applications.
We propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders.
arXiv Detail & Related papers (2024-06-17T03:01:22Z) - Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition [48.21696443824074]
We propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN)
Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level.
arXiv Detail & Related papers (2024-06-13T10:57:24Z) - Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding [21.94739567923136]
We focus on a special group of human body language -- the micro-gesture (MG)
MG differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings.
We explore various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods.
arXiv Detail & Related papers (2024-05-21T21:16:55Z) - GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing [74.68232970965595]
Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos.
This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks.
arXiv Detail & Related papers (2024-03-09T13:56:25Z) - Hierarchical Compositional Representations for Few-shot Action
Recognition [51.288829293306335]
We propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition.
We divide a complicated action into several sub-actions by carefully designed hierarchical clustering.
We also adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations.
arXiv Detail & Related papers (2022-08-19T16:16:59Z) - Micro-Expression Recognition Based on Attribute Information Embedding
and Cross-modal Contrastive Learning [22.525295392858293]
We propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning.
We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively.
arXiv Detail & Related papers (2022-05-29T12:28:10Z) - Video-based Facial Micro-Expression Analysis: A Survey of Datasets,
Features and Algorithms [52.58031087639394]
micro-expressions are involuntary and transient facial expressions.
They can provide important information in a broad range of applications such as lie detection, criminal detection, etc.
Since micro-expressions are transient and of low intensity, their detection and recognition is difficult and relies heavily on expert experiences.
arXiv Detail & Related papers (2022-01-30T05:14:13Z) - iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding
and Emotion Analysis [23.261770969903065]
iMiGUE is identity-free video dataset for Micro-Gesture Understanding and Emotion analysis (iMiGUE)
iMiGUE focuses on micro-gesture, i.e., unintentional behaviors driven by inner feelings.
arXiv Detail & Related papers (2021-07-01T08:15:14Z) - Micro-expression spotting: A new benchmark [74.69928316848866]
Micro-expressions (MEs) are brief and involuntary facial expressions that occur when people are trying to hide their true feelings or conceal their emotions.
In the computer vision field, the study of MEs can be divided into two main tasks, spotting and recognition.
This paper introduces an extension of the SMIC-E database, namely the SMIC-E-Long database, which is a new challenging benchmark for ME spotting.
arXiv Detail & Related papers (2020-07-24T09:18:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.