HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections
- URL: http://arxiv.org/abs/2410.08454v2
- Date: Thu, 24 Oct 2024 01:59:45 GMT
- Title: HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections
- Authors: Jiaxing Hao, Yanxi Wang, Zhigang Chang, Hongmin Gao, Zihao Cheng, Chen Wu, Xin Zhao, Peiye Fang, Rachmat Muwardi,
- Abstract summary: HorGait is a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR.
It achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset.
- Score: 8.56443762544299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.
Related papers
- DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery [2.1653492349540784]
DeforHMR is a novel regression-based monocular HMR framework designed to enhance the prediction of human pose parameters.
DeforHMR leverages a novel query-agnostic deformable cross-attention mechanism within the transformer decoder.
It achieves state-of-the-art performance for single-frame regression-based methods on the widely used 3D HMR benchmarks 3DPW and RICH.
arXiv Detail & Related papers (2024-11-18T00:46:59Z) - EVT: Efficient View Transformation for Multi-Modal 3D Object Detection [2.9848894641223302]
We propose a novel 3D object detector via efficient view transformation (EVT)
EVT uses Adaptive Sampling and Adaptive Projection (ASAP) to generate 3D sampling points and adaptive kernels.
It is designed to effectively utilize the obtained multi-modal BEV features within the transformer decoder.
arXiv Detail & Related papers (2024-11-16T06:11:10Z) - Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features.
We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer.
We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs.
We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space.
We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Blending Anti-Aliasing into Vision Transformer [57.88274087198552]
discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps.
Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions.
We propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue.
arXiv Detail & Related papers (2021-10-28T14:30:02Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.