GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation
and Multi-Scale Temporal Aggregation
- URL: http://arxiv.org/abs/2307.15981v2
- Date: Wed, 21 Feb 2024 10:57:23 GMT
- Title: GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation
and Multi-Scale Temporal Aggregation
- Authors: Yan Sun, Hu Long, Xueling Feng, and Mark Nixon
- Abstract summary: Gait recognition is one of the most promising video-based biometric technologies.
We propose a novel gait recognition framework, denoted as GaitASMS.
It can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information.
- Score: 2.0444600042188448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gait recognition is one of the most promising video-based biometric
technologies. The edge of silhouettes and motion are the most informative
feature and previous studies have explored them separately and achieved notable
results. However, due to occlusions and variations in viewing angles, their
gait recognition performance is often affected by the predefined spatial
segmentation strategy. Moreover, traditional temporal pooling usually neglects
distinctive temporal information in gait. To address the aforementioned issues,
we propose a novel gait recognition framework, denoted as GaitASMS, which can
effectively extract the adaptive structured spatial representations and
naturally aggregate the multi-scale temporal information. The Adaptive
Structured Representation Extraction Module (ASRE) separates the edge of
silhouettes by using the adaptive edge mask and maximizes the representation in
semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module
(MSTA) achieves effective modeling of long-short-range temporal information by
temporally aggregated structure. Furthermore, we propose a new data
augmentation, denoted random mask, to enrich the sample space of long-term
occlusion and enhance the generalization of the model. Extensive experiments
conducted on two datasets demonstrate the competitive advantage of proposed
method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset,
GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline
on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The
ablation experiments demonstrate the effectiveness of ASRE and MSTA. The source
code is available at https://github.com/YanSungithub/GaitASMS.
Related papers
- Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition [4.036669828958854]
We introduce a hierarchy and attention guided cross-masking framework (HA-CM) that applies masking to skeleton sequences from both spatial and temporal perspectives.
In spatial graphs, we utilize hyperbolic space to maintain joint distinctions and effectively preserve the hierarchical structure of high-dimensional skeletons.
In temporal flows, we substitute traditional distance metrics with the global attention of joints for masking, addressing the convergence of distances in high-dimensional space and the lack of a global perspective.
arXiv Detail & Related papers (2024-09-26T15:28:25Z) - Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series [0.0]
We propose an ALIgned Sits (ALISE) to take into account the spatial, spectral, and temporal dimensions of satellite imagery.
Unlike SSL models currently available for SITS, ALISE incorporates a flexible query mechanism to project the SITS into a common and learned temporal projection space.
The quality of the produced representation is assessed through three downstream tasks: crop segmentation (PASTIS), land cover segmentation (MultiSenGE) and a novel crop change detection dataset.
arXiv Detail & Related papers (2024-07-11T12:42:10Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - GaitFormer: Revisiting Intrinsic Periodicity for Gait Recognition [6.517046095186713]
Gait recognition aims to distinguish different walking patterns by analyzing video-level human silhouettes, rather than relying on appearance information.
Previous research has primarily focused on extracting local or global-temporal representations, while overlooking the intrinsic periodic features of gait sequences.
We propose a plug-and-play strategy, called Temporal Periodic Alignment (TPA), which leverages the periodic nature and fine-grained temporal dependencies of gait patterns.
arXiv Detail & Related papers (2023-07-25T05:05:07Z) - Hierarchical Spatio-Temporal Representation Learning for Gait
Recognition [6.877671230651998]
Gait recognition is a biometric technique that identifies individuals by their unique walking styles.
We propose a hierarchical-temporal representation learning framework for extracting gait features from coarse to fine.
Our method outperforms the state-of-the-art while maintaining a reasonable balance between model accuracy and complexity.
arXiv Detail & Related papers (2023-07-19T09:30:00Z) - One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton
Matching [77.6989219290789]
One-shot skeleton action recognition aims to learn a skeleton action recognition model with a single training sample.
This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching.
arXiv Detail & Related papers (2023-07-14T11:52:10Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - An Enhanced Adversarial Network with Combined Latent Features for
Spatio-Temporal Facial Affect Estimation in the Wild [1.3007851628964147]
This paper proposes a novel model that efficiently extracts both spatial and temporal features of the data by means of its enhanced temporal modelling based on latent features.
Our proposed model consists of three major networks, coined Generator, Discriminator, and Combiner, which are trained in an adversarial setting combined with curriculum learning to enable our adaptive attention modules.
arXiv Detail & Related papers (2021-02-18T04:10:12Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.