Watch Where You Move: Region-aware Dynamic Aggregation and Excitation for Gait Recognition
- URL: http://arxiv.org/abs/2510.16541v1
- Date: Sat, 18 Oct 2025 15:36:08 GMT
- Title: Watch Where You Move: Region-aware Dynamic Aggregation and Excitation for Gait Recognition
- Authors: Binyuan Huang, Yongdong Luo, Xianda Guo, Xiawu Zheng, Zheng Zhu, Jiahui Pan, Chengju Zhou,
- Abstract summary: GaitRDAE is a framework that automatically searches for motion regions, assigns adaptive temporal scales and applies corresponding attention.<n> Experimental results show that GaitRDAE achieves state-of-the-art performance on several benchmark datasets.
- Score: 55.52723195212868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based gait recognition has achieved great success in various applications. The key to accurate gait recognition lies in considering the unique and diverse behavior patterns in different motion regions, especially when covariates affect visual appearance. However, existing methods typically use predefined regions for temporal modeling, with fixed or equivalent temporal scales assigned to different types of regions, which makes it difficult to model motion regions that change dynamically over time and adapt to their specific patterns. To tackle this problem, we introduce a Region-aware Dynamic Aggregation and Excitation framework (GaitRDAE) that automatically searches for motion regions, assigns adaptive temporal scales and applies corresponding attention. Specifically, the framework includes two core modules: the Region-aware Dynamic Aggregation (RDA) module, which dynamically searches the optimal temporal receptive field for each region, and the Region-aware Dynamic Excitation (RDE) module, which emphasizes the learning of motion regions containing more stable behavior patterns while suppressing attention to static regions that are more susceptible to covariates. Experimental results show that GaitRDAE achieves state-of-the-art performance on several benchmark datasets.
Related papers
- TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation [61.94780858309546]
Tri-Domain Causal Text-to-Motion Generation (TriC-Motion) is a novel diffusion-based framework integrating spatial-domain modeling with causal intervention.<n>TriC-Motion achieves superior performance compared to state-of-the-art methods, attaining an outstanding R@1 of 0.612 on the HumanML3D dataset.
arXiv Detail & Related papers (2026-02-09T10:12:13Z) - Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition [35.62986006054654]
Action-Region Tracking (ART) is a novel solution leveraging a query-response mechanism to discover and track the dynamics of distinctive local details.<n>We propose a region-specific semantic activation module that employs discriminative and text-constrained semantics as queries.<n>Experiments on widely used action recognition benchmarks demonstrate the superiority to previous state-of-the-art baselines.
arXiv Detail & Related papers (2025-11-26T09:32:06Z) - ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting [12.846353008321394]
ADD-SLAM: an Adaptive Dynamic Dense SLAM framework based on Gaussian splitting.<n>We design an adaptive dynamic identification mechanism grounded in scene consistency analysis.<n>Ours requires no predefined semantic category priors and adaptively discovers scene dynamics.
arXiv Detail & Related papers (2025-05-26T02:17:17Z) - Vision-Language Models Assisted Unsupervised Video Anomaly Detection [3.1095294567873606]
Anomaly samples present significant challenges for unsupervised learning methods.
Our method employs a cross-modal pre-trained model that leverages the inferential capabilities of large language models.
By mapping high-dimensional visual features to low-dimensional semantic ones, our method significantly enhance the interpretability of unsupervised anomaly detection.
arXiv Detail & Related papers (2024-09-21T11:48:54Z) - Prompt-Driven Dynamic Object-Centric Learning for Single Domain
Generalization [61.64304227831361]
Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains.
We propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.
arXiv Detail & Related papers (2024-02-28T16:16:51Z) - Urban Regional Function Guided Traffic Flow Prediction [117.75679676806296]
We propose a novel module named POI-MetaBlock, which utilizes the functionality of each region as metadata.
Our module significantly improves the performance of traffic flow prediction and outperforms state-of-the-art methods that use metadata.
arXiv Detail & Related papers (2023-03-17T06:03:49Z) - RDFNet: Regional Dynamic FISTA-Net for Spectral Snapshot Compressive
Imaging [11.627511305913476]
We introduce a regional dynamic way of using Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) to exploit regional characteristics.
We then unfold the process into a hierarchical dynamic deep network, dubbed RDFNet.
Our proposed regional dynamic architecture can also learn appropriate shrinkage scale in a pixel-wise manner.
arXiv Detail & Related papers (2023-02-06T01:13:13Z) - MSA-GCN:Multiscale Adaptive Graph Convolution Network for Gait Emotion
Recognition [6.108523790270448]
We present a novel Multi Scale Adaptive Graph Convolution Network (MSA-GCN) to recognize emotions.
In our model, a adaptive selective spatial-temporal convolution is designed to select the convolution kernel dynamically to obtain the soft-temporal features of different emotions.
Compared with previous state-of-the-art methods, the proposed method achieves the best performance on two public datasets.
arXiv Detail & Related papers (2022-09-19T13:07:16Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Learning Self-Similarity in Space and Time as Generalized Motion for
Action Recognition [42.175450800733785]
We propose a rich motion representation based on video self-similarity (STSS)
We leverage the whole volume of STSSS and let our model learn to extract an effective motion representation from it.
The proposed neural block, dubbed SELFY, can be easily inserted into neural architectures and trained end-to-end without additional supervision.
arXiv Detail & Related papers (2021-02-14T07:32:55Z) - TAM: Temporal Adaptive Module for Video Recognition [60.83208364110288]
temporal adaptive module (bf TAM) generates video-specific temporal kernels based on its own feature map.
Experiments on Kinetics-400 and Something-Something datasets demonstrate that our TAM outperforms other temporal modeling methods consistently.
arXiv Detail & Related papers (2020-05-14T08:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.