Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition
- URL: http://arxiv.org/abs/2507.21977v2
- Date: Tue, 05 Aug 2025 08:10:56 GMT
- Title: Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition
- Authors: Jihao Gu, Kun Li, Fei Wang, Yanyan Wei, Zhiliang Wu, Hehe Fan, Meng Wang,
- Abstract summary: Micro-Actions (MAs) are an important form of non-verbal communication in social interactions.<n>Existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs.<n>We present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues.
- Score: 26.997350207742034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs, which limits the accuracy of distinguishing MAs with subtle changes. To address this issue, we present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues to enhance spatial-temporal representation learning. Specifically, we introduce a Motion-guided Skeletal Modulation module (MSM) to inject motion cues at the skeletal level, acting as a control signal to guide spatial representation modeling. In parallel, we design a Motion-guided Temporal Modulation module (MTM) to incorporate motion information at the frame level, facilitating the modeling of holistic motion patterns in micro-actions. Finally, we propose a motion consistency learning strategy to aggregate the motion cues from multi-scale features for micro-action classification. Experimental results on the Micro-Action 52 and iMiGUE datasets demonstrate that MMN achieves state-of-the-art performance in skeleton-based micro-action recognition, underscoring the importance of explicitly modeling subtle motion cues. The code will be available at https://github.com/momiji-bit/MMN.
Related papers
- MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception [47.80768014770871]
We propose a novel Micro-Expression Large Language Model (MELLM)<n>It incorporates a subtle facial motion perception strategy with the strong inference capabilities of MLLMs.<n>Our model exhibits superior robustness and generalization capabilities in micro-expression understanding (MEU)
arXiv Detail & Related papers (2025-05-11T15:08:23Z) - AMMSM: Adaptive Motion Magnification and Sparse Mamba for Micro-Expression Recognition [7.084377962617903]
We propose a multi-task learning framework named the Adaptive Motion Magnification and Sparse Mamba.<n>This framework aims to enhance the accurate capture of micro-expressions through self-supervised subtle motion magnification.<n>We employ evolutionary search to optimize the magnification factor and the sparsity ratios of spatial selection, followed by fine-tuning to improve performance further.
arXiv Detail & Related papers (2025-03-31T13:17:43Z) - Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing [23.70162749652725]
We develop a versatile set of simple yet effective motion editing methods via manipulating attention maps.<n>Our method enjoys good generation and editing ability with good explainability.
arXiv Detail & Related papers (2024-10-24T17:59:45Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis.
Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame.
We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - Conditional Motion In-betweening [19.470778961694453]
Motion in-betweening (MIB) is a process of generating intermediate skeletal movement between the given start and target poses.
We focus on the method that can handle pose or semantic conditioned MIB tasks using a unified model.
We also present a motion augmentation method to improve the quality of pose-conditioned motion generation.
arXiv Detail & Related papers (2022-02-09T06:47:56Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - SMA-STN: Segmented Movement-Attending Spatiotemporal Network
forMicro-Expression Recognition [20.166205708651194]
This paper proposes a segmented movement-attending network (SMA-STN) to reveal subtle movement changes visually in an efficient way.
Extensive experiments on three widely used benchmarks, i.e., CALoss II, SAMM, and SHIC, show that the proposed SMA-STN achieves better MER performance than other state-of-the-art methods.
arXiv Detail & Related papers (2020-10-19T09:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.