Multi-scale Context-aware Network with Transformer for Gait Recognition
- URL: http://arxiv.org/abs/2204.03270v3
- Date: Mon, 25 Sep 2023 18:43:58 GMT
- Title: Multi-scale Context-aware Network with Transformer for Gait Recognition
- Authors: Duowang Zhu, Xiaohu Huang, Xinggang Wang, Bo Yang, Botao He, Wenyu
Liu, and Bin Feng
- Abstract summary: We propose a multi-scale context-aware network with transformer (MCAT) for gait recognition.
MCAT generates temporal features across three scales, and adaptively aggregates them using contextual information from both local and global perspectives.
In order to remedy the spatial feature corruption resulting from temporal operations, MCAT incorporates a salient spatial feature learning (SSFL) module.
- Score: 35.521073630044434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although gait recognition has drawn increasing research attention recently,
since the silhouette differences are quite subtle in spatial domain, temporal
feature representation is crucial for gait recognition. Inspired by the
observation that humans can distinguish gaits of different subjects by
adaptively focusing on clips of varying time scales, we propose a multi-scale
context-aware network with transformer (MCAT) for gait recognition. MCAT
generates temporal features across three scales, and adaptively aggregates them
using contextual information from both local and global perspectives.
Specifically, MCAT contains an adaptive temporal aggregation (ATA) module that
performs local relation modeling followed by global relation modeling to fuse
the multi-scale features. Besides, in order to remedy the spatial feature
corruption resulting from temporal operations, MCAT incorporates a salient
spatial feature learning (SSFL) module to select groups of discriminative
spatial features. Extensive experiments conducted on three datasets demonstrate
the state-of-the-art performance. Concretely, we achieve rank-1 accuracies of
98.7%, 96.2% and 88.7% under normal walking, bag-carrying and coat-wearing
conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW. The source code will
be available at https://github.com/zhuduowang/MCAT.git.
Related papers
- A Multi-Stage Adaptive Feature Fusion Neural Network for Multimodal Gait
Recognition [15.080096318551346]
Most existing gait recognition algorithms are unimodal, and a few multimodal gait recognition algorithms perform multimodal fusion only once.
We propose a multi-stage feature fusion strategy (MSFFS), which performs multimodal fusions at different stages in the feature extraction process.
Also, we propose an adaptive feature fusion module (AFFM) that considers the semantic association between silhouettes and skeletons.
arXiv Detail & Related papers (2023-12-22T03:25:15Z) - Learning multi-domain feature relation for visible and Long-wave
Infrared image patch matching [39.88037892637296]
We present the largest visible and Long-wave Infrared (LWIR) image patch matching dataset, termed VL-CMIM.
In addition, a multi-domain feature relation learning network (MD-FRN) is proposed.
arXiv Detail & Related papers (2023-08-09T11:23:32Z) - GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation
and Multi-Scale Temporal Aggregation [2.0444600042188448]
Gait recognition is one of the most promising video-based biometric technologies.
We propose a novel gait recognition framework, denoted as GaitASMS.
It can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information.
arXiv Detail & Related papers (2023-07-29T13:03:17Z) - MetaGait: Learning to Learn an Omni Sample Adaptive Representation for
Gait Recognition [16.26377062742576]
We develop a novel MetaGait that learns to learn an omni sample adaptive representation.
We leverage the meta-knowledge across the entire process, where Meta Triple Attention and Meta Temporal Pooling are presented.
Extensive experiments demonstrate the state-of-the-art performance of the proposed MetaGait.
arXiv Detail & Related papers (2023-06-06T06:53:05Z) - GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition [34.07501669897291]
GaitGS is a framework that aggregates temporal features simultaneously in both granularity and span dimensions.
Our method demonstrates state-of-the-art performance, achieving Rank-1 accuracy of 98.2%, 96.5%, and 89.7% on two datasets.
arXiv Detail & Related papers (2023-05-31T09:48:25Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking [53.668757725179056]
We propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.
Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations.
Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance.
arXiv Detail & Related papers (2021-12-14T18:59:11Z) - Automatic size and pose homogenization with spatial transformer network
to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN)
Our architecture is composed of three sequential modules that are estimated together during training.
We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z) - MST: Masked Self-Supervised Transformer for Visual Representation [52.099722121603506]
Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP)
We present a novel Masked Self-supervised Transformer approach named MST, which can explicitly capture the local context of an image.
MST achieves Top-1 accuracy of 76.9% with DeiT-S only using 300-epoch pre-training by linear evaluation.
arXiv Detail & Related papers (2021-06-10T11:05:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.