Hierarchical Spatio-Temporal Representation Learning for Gait
Recognition
- URL: http://arxiv.org/abs/2307.09856v1
- Date: Wed, 19 Jul 2023 09:30:00 GMT
- Title: Hierarchical Spatio-Temporal Representation Learning for Gait
Recognition
- Authors: Lei Wang, Bo Liu, Fangfang Liang, Bincheng Wang
- Abstract summary: Gait recognition is a biometric technique that identifies individuals by their unique walking styles.
We propose a hierarchical-temporal representation learning framework for extracting gait features from coarse to fine.
Our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.
- Score: 6.877671230651998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gait recognition is a biometric technique that identifies individuals by
their unique walking styles, which is suitable for unconstrained environments
and has a wide range of applications. While current methods focus on exploiting
body part-based representations, they often neglect the hierarchical
dependencies between local motion patterns. In this paper, we propose a
hierarchical spatio-temporal representation learning (HSTL) framework for
extracting gait features from coarse to fine. Our framework starts with a
hierarchical clustering analysis to recover multi-level body structures from
the whole body to local details. Next, an adaptive region-based motion
extractor (ARME) is designed to learn region-independent motion features. The
proposed HSTL then stacks multiple ARMEs in a top-down manner, with each ARME
corresponding to a specific partition level of the hierarchy. An adaptive
spatio-temporal pooling (ASTP) module is used to capture gait features at
different levels of detail to perform hierarchical feature mapping. Finally, a
frame-level temporal aggregation (FTA) module is employed to reduce redundant
information in gait sequences through multi-scale temporal downsampling.
Extensive experiments on CASIA-B, OUMVLP, GREW, and Gait3D datasets demonstrate
that our method outperforms the state-of-the-art while maintaining a reasonable
balance between model accuracy and complexity.
Related papers
- Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition [0.0]
In this paper, we propose self-attention GCN hybrid model, Multi-Scale Spatial-Temporal self-attention (MSST)-GCN.
We utilize spatial self-attention module with adaptive topology to understand intra-frame interactions within a frame among different body parts, and temporal self-attention module to examine correlations between frames of a node.
arXiv Detail & Related papers (2024-04-03T10:25:45Z) - HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition [3.431054404120758]
We present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition.
HiH features a main branch that utilizes Hierarchical Gait Decomposer modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data.
An auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis.
arXiv Detail & Related papers (2023-11-19T03:25:14Z) - Local-Global Temporal Difference Learning for Satellite Video
Super-Resolution [55.69322525367221]
We propose to exploit the well-defined temporal difference for efficient and effective temporal compensation.
To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies.
Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-10T07:04:40Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Multi-Scale Spatial Temporal Graph Convolutional Network for
Skeleton-Based Action Recognition [13.15374205970988]
We present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module.
The MS-GC and MT-GC modules decompose the corresponding local graph convolution into a set of sub-graph convolutions, forming a hierarchical residual architecture.
We propose a multi-scale spatial temporal graph convolutional network (MST-GCN), which stacks multiple blocks to learn effective motion representations for action recognition.
arXiv Detail & Related papers (2022-06-27T03:17:33Z) - GaitStrip: Gait Recognition via Effective Strip-based Feature
Representations and Multi-Level Framework [34.397404430838286]
We present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels.
To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes.
Our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.
arXiv Detail & Related papers (2022-03-08T09:49:48Z) - Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations.
Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Video Is Graph: Structured Graph Module for Video Action Recognition [34.918667614077805]
We transform a video sequence into a graph to obtain direct long-term dependencies among temporal frames.
In particular, SGM divides the neighbors of each node into several temporal regions so as to extract global structural information.
The reported performance and analysis demonstrate that SGM can achieve outstanding precision with less computational complexity.
arXiv Detail & Related papers (2021-10-12T11:27:29Z) - Spatio-Temporal Representation Factorization for Video-based Person
Re-Identification [55.01276167336187]
We propose Spatio-Temporal Representation Factorization module (STRF) for re-ID.
STRF is a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID.
We empirically show that STRF improves performance of various existing baseline architectures while demonstrating new state-of-the-art results.
arXiv Detail & Related papers (2021-07-25T19:29:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.