Gait Recognition in the Wild with Multi-hop Temporal Switch
- URL: http://arxiv.org/abs/2209.00355v1
- Date: Thu, 1 Sep 2022 10:46:09 GMT
- Title: Gait Recognition in the Wild with Multi-hop Temporal Switch
- Authors: Jinkai Zheng, Xinchen Liu, Xiaoyan Gu, Yaoqi Sun, Chuang Gan, Jiyong
Zhang, Wu Liu, Chenggang Yan
- Abstract summary: gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
- Score: 81.35245014397759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing studies for gait recognition are dominated by in-the-lab scenarios.
Since people live in real-world senses, gait recognition in the wild is a more
practical problem that has recently attracted the attention of the community of
multimedia and computer vision. Current methods that obtain state-of-the-art
performance on in-the-lab benchmarks achieve much worse accuracy on the
recently proposed in-the-wild datasets because these methods can hardly model
the varied temporal dynamics of gait sequences in unconstrained scenes.
Therefore, this paper presents a novel multi-hop temporal switch method to
achieve effective temporal modeling of gait patterns in real-world scenes.
Concretely, we design a novel gait recognition network, named Multi-hop
Temporal Switch Network (MTSGait), to learn spatial features and multi-scale
temporal features simultaneously. Different from existing methods that use 3D
convolutions for temporal modeling, our MTSGait models the temporal dynamics of
gait sequences by 2D convolutions. By this means, it achieves high efficiency
with fewer model parameters and reduces the difficulty in optimization compared
with 3D convolution-based models. Based on the specific design of the 2D
convolution kernels, our method can eliminate the misalignment of features
among adjacent frames. In addition, a new sampling strategy, i.e., non-cyclic
continuous sampling, is proposed to make the model learn more robust temporal
features. Finally, the proposed method achieves superior performance on two
public gait in-the-wild datasets, i.e., GREW and Gait3D, compared with
state-of-the-art methods.
Related papers
- Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures [12.703947839247693]
Diffusion models, emerging as powerful deep generative tools, excel in various applications.
However, their remarkable generative performance is hindered by slow training and sampling.
This is due to the necessity of tracking extensive forward and reverse diffusion trajectories.
We present a multi-stage framework inspired by our empirical findings to tackle these challenges.
arXiv Detail & Related papers (2023-12-14T17:48:09Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.