Related papers: Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition

Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition

URL: http://arxiv.org/abs/2507.21977v2
Date: Tue, 05 Aug 2025 08:10:56 GMT
Title: Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition
Authors: Jihao Gu, Kun Li, Fei Wang, Yanyan Wei, Zhiliang Wu, Hehe Fan, Meng Wang,
Abstract summary: Micro-Actions (MAs) are an important form of non-verbal communication in social interactions.<n>Existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs.<n>We present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues.
Score: 26.997350207742034
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs, which limits the accuracy of distinguishing MAs with subtle changes. To address this issue, we present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues to enhance spatial-temporal representation learning. Specifically, we introduce a Motion-guided Skeletal Modulation module (MSM) to inject motion cues at the skeletal level, acting as a control signal to guide spatial representation modeling. In parallel, we design a Motion-guided Temporal Modulation module (MTM) to incorporate motion information at the frame level, facilitating the modeling of holistic motion patterns in micro-actions. Finally, we propose a motion consistency learning strategy to aggregate the motion cues from multi-scale features for micro-action classification. Experimental results on the Micro-Action 52 and iMiGUE datasets demonstrate that MMN achieves state-of-the-art performance in skeleton-based micro-action recognition, underscoring the importance of explicitly modeling subtle motion cues. The code will be available at https://github.com/momiji-bit/MMN.

Related papers

MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception [47.80768014770871]
We propose a novel Micro-Expression Large Language Model (MELLM)<n>It incorporates a subtle facial motion perception strategy with the strong inference capabilities of MLLMs.<n>Our model exhibits superior robustness and generalization capabilities in micro-expression understanding (MEU)
arXiv Detail & Related papers (2025-05-11T15:08:23Z)
AMMSM: Adaptive Motion Magnification and Sparse Mamba for Micro-Expression Recognition [7.084377962617903]
We propose a multi-task learning framework named the Adaptive Motion Magnification and Sparse Mamba.<n>This framework aims to enhance the accurate capture of micro-expressions through self-supervised subtle motion magnification.<n>We employ evolutionary search to optimize the magnification factor and the sparsity ratios of spatial selection, followed by fine-tuning to improve performance further.
arXiv Detail & Related papers (2025-03-31T13:17:43Z)
Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing [23.70162749652725]
We develop a versatile set of simple yet effective motion editing methods via manipulating attention maps.<n>Our method enjoys good generation and editing ability with good explainability.
arXiv Detail & Related papers (2024-10-24T17:59:45Z)
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z)
Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis. Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame. We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z)
MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions. Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset. We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z)
Conditional Motion In-betweening [19.470778961694453]
Motion in-betweening (MIB) is a process of generating intermediate skeletal movement between the given start and target poses. We focus on the method that can handle pose or semantic conditioned MIB tasks using a unified model. We also present a motion augmentation method to improve the quality of pose-conditioned motion generation.
arXiv Detail & Related papers (2022-02-09T06:47:56Z)
Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy. MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z)
EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content. First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events. Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z)
Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information. MoSI can discover regions with large motion even without fine-tuning on the downstream datasets. We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)
Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency. We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector. We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z)
SMA-STN: Segmented Movement-Attending Spatiotemporal Network forMicro-Expression Recognition [20.166205708651194]
This paper proposes a segmented movement-attending network (SMA-STN) to reveal subtle movement changes visually in an efficient way. Extensive experiments on three widely used benchmarks, i.e., CALoss II, SAMM, and SHIC, show that the proposed SMA-STN achieves better MER performance than other state-of-the-art methods.
arXiv Detail & Related papers (2020-10-19T09:23:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.