Spatiotemporal Multi-scale Bilateral Motion Network for Gait Recognition
- URL: http://arxiv.org/abs/2209.12364v1
- Date: Mon, 26 Sep 2022 01:36:22 GMT
- Title: Spatiotemporal Multi-scale Bilateral Motion Network for Gait Recognition
- Authors: Xinnan Ding, Shan Du, Yu Zhang, and Kejun Wang
- Abstract summary: In this paper, motivated by optical flow, the bilateral motion-oriented features are proposed.
We develop a set of multi-scale temporal representations that force the motion context to be richly described at various levels of temporal resolution.
- Score: 3.1240043488226967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The critical goal of gait recognition is to acquire the inter-frame walking
habit representation from the gait sequences. The relations between frames,
however, have not received adequate attention in comparison to the intra-frame
features. In this paper, motivated by optical flow, the bilateral
motion-oriented features are proposed, which can allow the classic
convolutional structure to have the capability to directly portray gait
movement patterns at the feature level. Based on such features, we develop a
set of multi-scale temporal representations that force the motion context to be
richly described at various levels of temporal resolution. Furthermore, a
correction block is devised to eliminate the segmentation noise of silhouettes
for getting more precise gait information. Subsequently, the temporal feature
set and the spatial features are combined to comprehensively characterize gait
processes. Extensive experiments are conducted on CASIA-B and OU-MVLP datasets,
and the results achieve an outstanding identification performance, which has
demonstrated the effectiveness of the proposed approach.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Wavelet-Decoupling Contrastive Enhancement Network for Fine-Grained
Skeleton-Based Action Recognition [8.743480762121937]
We propose a Wavelet-Attention Decoupling (WAD) module to disentangle salient and subtle motion features in the time-frequency domain.
We also propose a Fine-grained Contrastive Enhancement (FCE) module to enhance attention towards trajectory features by contrastive learning.
Our methods perform competitively compared to state-of-the-art methods and can discriminate confusing fine-grained actions well.
arXiv Detail & Related papers (2024-02-03T16:51:04Z) - HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition [3.431054404120758]
We present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition.
HiH features a main branch that utilizes Hierarchical Gait Decomposer modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data.
An auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis.
arXiv Detail & Related papers (2023-11-19T03:25:14Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Sequential convolutional network for behavioral pattern extraction in
gait recognition [0.7874708385247353]
We propose a sequential convolutional network (SCN) to learn the walking pattern of individuals.
In SCN, behavioral information extractors (BIE) are constructed to comprehend intermediate feature maps in time series.
A multi-frame aggregator in SCN performs feature integration on a sequence whose length is uncertain, via a mobile 3D convolutional layer.
arXiv Detail & Related papers (2021-04-23T08:44:10Z) - Multiple Object Tracking with Correlation Learning [16.959379957515974]
We propose to exploit the local correlation module to model the topological relationship between targets and their surrounding environment.
Specifically, we establish dense correspondences of each spatial location and its context, and explicitly constrain the correlation volumes through self-supervised learning.
Our approach demonstrates the effectiveness of correlation learning with the superior performance and obtains state-of-the-art MOTA of 76.5% and IDF1 of 73.6% on MOT17.
arXiv Detail & Related papers (2021-04-08T06:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.