MMAD: Multi-label Micro-Action Detection in Videos
- URL: http://arxiv.org/abs/2407.05311v1
- Date: Sun, 7 Jul 2024 09:45:14 GMT
- Title: MMAD: Multi-label Micro-Action Detection in Videos
- Authors: Kun Li, Dan Guo, Pengyu Liu, Guoliang Chen, Meng Wang,
- Abstract summary: We propose a new task named Multi-label Micro-Action Detection (MMAD)
MMAD involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them.
To support the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52), specifically designed to facilitate the detailed analysis and exploration of complex human micro-actions.
- Score: 23.508563348306534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human body actions are an important form of non-verbal communication in social interactions. This paper focuses on a specific subset of body actions known as micro-actions, which are subtle, low-intensity body movements that provide a deeper understanding of inner human feelings. In real-world scenarios, human micro-actions often co-occur, with multiple micro-actions overlapping in time, such as simultaneous head and hand movements. However, current research primarily focuses on recognizing individual micro-actions while overlooking their co-occurring nature. To narrow this gap, we propose a new task named Multi-label Micro-Action Detection (MMAD), which involves identifying all micro-actions in a given short video, determining their start and end times, and categorizing them. Achieving this requires a model capable of accurately capturing both long-term and short-term action relationships to locate and classify multiple micro-actions. To support the MMAD task, we introduce a new dataset named Multi-label Micro-Action-52 (MMA-52), specifically designed to facilitate the detailed analysis and exploration of complex human micro-actions. The proposed MMA-52 dataset is available at: https://github.com/VUT-HFUT/Micro-Action.
Related papers
- Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition [48.21696443824074]
We propose a novel framework for micro-expression recognition, named the Adaptive Temporal Motion Guided Graph Convolution Network (ATM-GCN)
Our framework excels at capturing temporal dependencies between frames across the entire clip, thereby enhancing micro-expression recognition at the clip level.
arXiv Detail & Related papers (2024-06-13T10:57:24Z) - Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding [21.94739567923136]
We focus on a special group of human body language -- the micro-gesture (MG)
MG differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings.
We explore various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods.
arXiv Detail & Related papers (2024-05-21T21:16:55Z) - Benchmarking Micro-action Recognition: Dataset, Methods, and Applications [26.090557725760934]
Micro-action is imperceptible non-verbal behaviour characterised by low-intensity movement.
In this study, we innovatively collect a new micro-action dataset designated as Micro-action-52 (MA-52)
Uniquely, MA-52 provides the whole-body perspective including gestures, upper- and lower-limb movements.
arXiv Detail & Related papers (2024-03-08T11:48:44Z) - "Filling the Blanks'': Identifying Micro-activities that Compose Complex
Human Activities of Daily Living [6.841115530838644]
AmicroN adapts a top-down'' approach by exploiting coarse-grained annotated data to expand the macro-activities into constituent micro-activities.
In the backend, AmicroN uses textitunsupervised change-point detection to search for the micro-activity boundaries across a complex ADL.
We evaluate AmicroN on two real-life publicly available datasets and observe that AmicroN can identify the micro-activities with micro Ftextsubscript1-score $>0.75$ for both datasets.
arXiv Detail & Related papers (2023-06-22T18:14:54Z) - Multi-queue Momentum Contrast for Microvideo-Product Retrieval [57.527227171945796]
We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances.
A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval.
A discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories.
arXiv Detail & Related papers (2022-12-22T03:47:14Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - Video-based Facial Micro-Expression Analysis: A Survey of Datasets,
Features and Algorithms [52.58031087639394]
micro-expressions are involuntary and transient facial expressions.
They can provide important information in a broad range of applications such as lie detection, criminal detection, etc.
Since micro-expressions are transient and of low intensity, their detection and recognition is difficult and relies heavily on expert experiences.
arXiv Detail & Related papers (2022-01-30T05:14:13Z) - iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding
and Emotion Analysis [23.261770969903065]
iMiGUE is identity-free video dataset for Micro-Gesture Understanding and Emotion analysis (iMiGUE)
iMiGUE focuses on micro-gesture, i.e., unintentional behaviors driven by inner feelings.
arXiv Detail & Related papers (2021-07-01T08:15:14Z) - LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities [119.88381048477854]
We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
arXiv Detail & Related papers (2020-07-31T00:13:54Z) - Learning Modality Interaction for Temporal Sentence Localization and
Event Captioning in Videos [76.21297023629589]
We propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos.
Our method turns out to achieve state-of-the-art performances on four standard benchmark datasets.
arXiv Detail & Related papers (2020-07-28T12:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.