GaitMAST: Motion-Aware Spatio-Temporal Feature Learning Network for
Cross-View Gait Recognition
- URL: http://arxiv.org/abs/2210.11817v1
- Date: Fri, 21 Oct 2022 08:42:00 GMT
- Title: GaitMAST: Motion-Aware Spatio-Temporal Feature Learning Network for
Cross-View Gait Recognition
- Authors: Jingqi Li, Jiaqi Gao, Yuzhen Zhang, Hongming Shan, Junping Zhang
- Abstract summary: We propose GaitMAST, which can unleash the potential of motion-aware features.
GitMAST preserves the individual's unique walking patterns well.
Our model achieves an average rank-1 accuracy of 98.1%.
- Score: 32.76653659564304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a unique biometric that can be perceived at a distance, gait has broad
applications in person authentication, social security and so on. Existing gait
recognition methods pay attention to extracting either spatial or
spatiotemporal representations. However, they barely consider extracting
diverse motion features, a fundamental characteristic in gaits, from gait
sequences. In this paper, we propose a novel motion-aware spatiotemporal
feature learning network for gait recognition, termed GaitMAST, which can
unleash the potential of motion-aware features. In the shallow layer,
specifically, we propose a dual-path frame-level feature extractor, in which
one path extracts overall spatiotemporal features and the other extracts motion
salient features by focusing on dynamic regions. In the deeper layers, we
design a two-branch clip-level feature extractor, in which one focuses on
fine-grained spatial information and the other on motion detail preservation.
Consequently, our GaitMAST preserves the individual's unique walking patterns
well, further enhancing the robustness of spatiotemporal features. Extensive
experimental results on two commonly-used cross-view gait datasets demonstrate
the superior performance of GaitMAST over existing state-of-the-art methods. On
CASIA-B, our model achieves an average rank-1 accuracy of 94.1%. In particular,
GaitMAST achieves rank-1 accuracies of 96.1% and 88.1% under the bag-carry and
coat wearing conditions, respectively, outperforming the second best by a large
margin and demonstrating its robustness against spatial variations.
Related papers
- It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment [72.75844404617959]
This paper proposes a novel cross-granularity alignment gait recognition method, named XGait.
To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces.
Comprehensive experiments on two large-scale gait datasets show XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG.
arXiv Detail & Related papers (2024-11-16T08:54:27Z) - GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation
and Multi-Scale Temporal Aggregation [2.0444600042188448]
Gait recognition is one of the most promising video-based biometric technologies.
We propose a novel gait recognition framework, denoted as GaitASMS.
It can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information.
arXiv Detail & Related papers (2023-07-29T13:03:17Z) - Distillation-guided Representation Learning for Unconstrained Gait Recognition [50.0533243584942]
We propose a framework, termed GAit DEtection and Recognition (GADER), for human authentication in challenging outdoor scenarios.
GADER builds discriminative features through a novel gait recognition method, where only frames containing gait information are used.
We evaluate our method on multiple State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets.
arXiv Detail & Related papers (2023-07-27T01:53:57Z) - GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition [34.07501669897291]
GaitGS is a framework that aggregates temporal features simultaneously in both granularity and span dimensions.
Our method demonstrates state-of-the-art performance, achieving Rank-1 accuracy of 98.2%, 96.5%, and 89.7% on two datasets.
arXiv Detail & Related papers (2023-05-31T09:48:25Z) - DyGait: Exploiting Dynamic Representations for High-performance Gait
Recognition [35.642868929840034]
Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns.
We propose a novel and high-performance framework named DyGait to focus on the extraction of dynamic features.
Our network achieves an average Rank-1 accuracy of 71.4% on the GREW dataset, 66.3% on the Gait3D dataset, 98.4% on the CASIA-B dataset and 98.3% on the OU-M dataset.
arXiv Detail & Related papers (2023-03-27T07:36:47Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Spatio-temporal Gait Feature with Adaptive Distance Alignment [90.5842782685509]
We try to increase the difference of gait features of different subjects from two aspects: the optimization of network structure and the refinement of extracted gait features.
Our method is proposed, it consists of Spatio-temporal Feature Extraction (SFE) and Adaptive Distance Alignment (ADA)
ADA uses a large number of unlabeled gait data in real life as a benchmark to refine the extracted-temporal features to make them have low inter-class similarity and high intra-class similarity.
arXiv Detail & Related papers (2022-03-07T13:34:00Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Selective Spatio-Temporal Aggregation Based Pose Refinement System:
Towards Understanding Human Activities in Real-World Videos [8.571131862820833]
State-of-the-art pose estimators struggle in obtaining high-quality 2D or 3D pose data due to truncation and low-resolution in real-world un-annotated videos.
We propose a Selective Spatio-Temporal Aggregation mechanism, named SST-A, that refines and smooths the keypoint locations extracted by multiple expert pose estimators.
We demonstrate that the skeleton data refined by our Pose-Refinement system (SSTA-PRS) is effective at boosting various existing action recognition models.
arXiv Detail & Related papers (2020-11-10T19:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.