Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
- URL: http://arxiv.org/abs/2404.04120v1
- Date: Thu, 4 Apr 2024 10:12:55 GMT
- Title: Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
- Authors: Rui Wang, Chuanfu Shen, Manuel J. Marin-Jimenez, George Q. Huang, Shiqi Yu,
- Abstract summary: We present CrossGait, capable of cross retrieving diverse data modalities.
We propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features.
We also design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space.
- Score: 8.976513021790984
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - EMDFNet: Efficient Multi-scale and Diverse Feature Network for Traffic Sign Detection [11.525603303355268]
The detection of small objects, particularly traffic signs, is a critical subtask within object detection and autonomous driving.
Motivated by these challenges, we propose a novel object detection network named Efficient Multi-scale and Diverse Feature Network (EMDFNet)
EMDFNet integrates an Augmented Shortcut Module and an Efficient Hybrid to address the aforementioned issues simultaneously.
arXiv Detail & Related papers (2024-08-26T11:26:27Z) - Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification [17.285526655788274]
Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities.
Existing methods generally try to bridge the cross-modal differences at image or feature level.
We introduce a dynamic identity-guided attention network (DIAN) to mine identity-guided and modality-consistent embeddings.
arXiv Detail & Related papers (2024-05-21T12:04:56Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Cross-Modality Attentive Feature Fusion for Object Detection in
Multispectral Remote Sensing Imagery [0.6853165736531939]
Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms.
We propose a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions.
Our proposed approach can achieve the state-of-the-art performance at a low cost.
arXiv Detail & Related papers (2021-12-06T13:12:36Z) - Exploring Modality-shared Appearance Features and Modality-invariant
Relation Features for Cross-modality Person Re-Identification [72.95858515157603]
Cross-modality person re-identification works rely on discriminative modality-shared features.
Despite some initial success, such modality-shared appearance features cannot capture enough modality-invariant information.
A novel cross-modality quadruplet loss is proposed to further reduce the cross-modality variations.
arXiv Detail & Related papers (2021-04-23T11:14:07Z) - Robust Facial Landmark Detection by Cross-order Cross-semantic Deep
Network [58.843211405385205]
We propose a cross-order cross-semantic deep network (CCDN) to boost the semantic features learning for robust facial landmark detection.
Specifically, a cross-order two-squeeze multi-excitation (CTM) module is proposed to introduce the cross-order channel correlations for more discriminative representations learning.
A novel cross-order cross-semantic (COCS) regularizer is designed to drive the network to learn cross-order cross-semantic features from different activation for facial landmark detection.
arXiv Detail & Related papers (2020-11-16T08:19:26Z) - Cross-modality Person re-identification with Shared-Specific Feature
Transfer [112.60513494602337]
Cross-modality person re-identification (cm-ReID) is a challenging but key technology for intelligent video analysis.
We propose a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics.
arXiv Detail & Related papers (2020-02-28T00:18:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.