CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
- URL: http://arxiv.org/abs/2407.03632v1
- Date: Thu, 4 Jul 2024 04:51:01 GMT
- Title: CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
- Authors: Huanzhang Dou, Pengyi Zhang, Yuhan Zhao, Lu Jin, Xi Li,
- Abstract summary: We present a walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL) framework.
Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level.
Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
- Score: 14.86306286844144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
Related papers
- DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition [21.45735405341433]
In this paper, we introduce a novel framework, termed DepthGait, that incorporates RGB-derived depth maps and silhouettes for enhanced gait recognition.<n>Specifically, apart from the 2D silhouette representation of the human body, the proposed pipeline explicitly estimates depth maps from a given RGB image sequence.<n>A novel multi-scale and cross-level fusion scheme has also been developed to bridge the modality gap between depth maps and silhouettes.
arXiv Detail & Related papers (2025-08-05T12:45:29Z) - A Stable Whitening Optimizer for Efficient Neural Network Training [101.89246340672246]
Building on the Shampoo family of algorithms, we identify and alleviate three key issues, resulting in the proposed SPlus method.<n>First, we find that naive Shampoo is prone to divergence when matrix-inverses are cached for long periods.<n>Second, we adapt a shape-aware scaling to enable learning rate transfer across network width.<n>Third, we find that high learning rates result in large parameter noise, and propose a simple iterate-averaging scheme which unblocks faster learning.
arXiv Detail & Related papers (2025-06-08T18:43:31Z) - Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder [5.597437966490453]
Traffic signs recognition (TSR) plays an essential role in assistant driving and intelligent transportation system.
In this article, we propose IECES-network which with improved encoders and Siamese net.
The proposed method achieves competitive performance precision-recall and accuracy metric average is 88.1%, 86.43% and 86.1% with a 2.9M lightweight scale.
arXiv Detail & Related papers (2025-02-21T09:03:05Z) - It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment [72.75844404617959]
This paper proposes a novel cross-granularity alignment gait recognition method, named XGait.
To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces.
Comprehensive experiments on two large-scale gait datasets show XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG.
arXiv Detail & Related papers (2024-11-16T08:54:27Z) - TriGait: Aligning and Fusing Skeleton and Silhouette Gait Data via a
Tri-Branch Network [4.699718818019937]
Gait recognition is a promising biometric technology for identification due to its non-invasiveness and long-distance.
external variations such as clothing changes and viewpoint differences pose significant challenges to gait recognition.
A novel triple branch gait recognition framework, TriGait, is proposed in this paper.
arXiv Detail & Related papers (2023-08-25T12:19:51Z) - Distillation-guided Representation Learning for Unconstrained Gait Recognition [50.0533243584942]
We propose a framework, termed GAit DEtection and Recognition (GADER), for human authentication in challenging outdoor scenarios.
GADER builds discriminative features through a novel gait recognition method, where only frames containing gait information are used.
We evaluate our method on multiple State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets.
arXiv Detail & Related papers (2023-07-27T01:53:57Z) - Neural Point-based Volumetric Avatar: Surface-guided Neural Points for
Efficient and Photorealistic Volumetric Head Avatar [62.87222308616711]
We propose fullname (name), a method that adopts the neural point representation and the neural volume rendering process.
Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map.
By design, our name is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars.
arXiv Detail & Related papers (2023-07-11T03:40:10Z) - UncLe-SLAM: Uncertainty Learning for Dense Neural SLAM [60.575435353047304]
We present an uncertainty learning framework for dense neural simultaneous localization and mapping (SLAM)
We propose an online framework for sensor uncertainty estimation that can be trained in a self-supervised manner from only 2D input data.
arXiv Detail & Related papers (2023-06-19T16:26:25Z) - DyGait: Exploiting Dynamic Representations for High-performance Gait
Recognition [35.642868929840034]
Gait recognition is a biometric technology that recognizes the identity of humans through their walking patterns.
We propose a novel and high-performance framework named DyGait to focus on the extraction of dynamic features.
Our network achieves an average Rank-1 accuracy of 71.4% on the GREW dataset, 66.3% on the Gait3D dataset, 98.4% on the CASIA-B dataset and 98.3% on the OU-M dataset.
arXiv Detail & Related papers (2023-03-27T07:36:47Z) - Leveraging the Third Dimension in Contrastive Learning [88.17394309208925]
Self-Supervised Learning (SSL) methods operate on unlabeled data to learn robust representations useful for downstream tasks.
These augmentations ignore the fact that biological vision takes place in an immersive three-dimensional, temporally contiguous environment.
We explore two distinct approaches to incorporating depth signals into the SSL framework.
arXiv Detail & Related papers (2023-01-27T15:45:03Z) - RGB-D based Stair Detection using Deep Learning for Autonomous Stair
Climbing [6.362951673024623]
We propose a neural network architecture with inputs of both RGB map and depth map.
Specifically, we design the selective module which can make the network learn the complementary relationship between RGB map and depth map.
Experiments on our dataset show that our method can achieve better accuracy and recall compared with the previous state-of-the-art deep learning method.
arXiv Detail & Related papers (2022-12-02T11:22:52Z) - Towards a Deeper Understanding of Skeleton-based Gait Recognition [4.812321790984493]
In recent years, most gait recognition methods used the person's silhouette to extract the gait features.
Model-based methods do not suffer from these problems and are able to represent the temporal motion of body joints.
In this work, we propose an approach based on Graph Convolutional Networks (GCNs) that combines higher-order inputs, and residual networks.
arXiv Detail & Related papers (2022-04-16T18:23:37Z) - Combining the Silhouette and Skeleton Data for Gait Recognition [13.345465199699]
Two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes and skeletons, respectively.
This paper proposes a CNN-based branch taking silhouettes as input and a GCN-based branch taking skeletons as input.
For better gait representation in the GCN-based branch, we present a fully connected graph convolution operator to integrate multi-scale graph convolutions.
arXiv Detail & Related papers (2022-02-22T03:21:51Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.