Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
- URL: http://arxiv.org/abs/2407.12519v1
- Date: Wed, 17 Jul 2024 12:16:44 GMT
- Title: Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
- Authors: Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu,
- Abstract summary: We propose CLTD, a discriminative feature learning module designed to eliminate the influence of confounders in triple domains, ie, spatial, temporal, and spectral.
Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains.
Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs.
- Score: 36.55724380184354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.
Related papers
- Frequency-Spatial Entanglement Learning for Camouflaged Object Detection [34.426297468968485]
Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design.
We propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method.
Our experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets.
arXiv Detail & Related papers (2024-09-03T07:58:47Z) - Spatial-Temporal Cross-View Contrastive Pre-training for Check-in Sequence Representation Learning [21.580705078081078]
We propose a novel Spatial-Temporal Cross-view Contrastive Representation (ST CCR) framework for check-in sequence representation learning.
ST CCR employs self-supervision from "spatial topic" and "temporal intention" views, facilitating effective fusion of spatial and temporal information at the semantic level.
We extensively evaluate ST CCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.
arXiv Detail & Related papers (2024-07-22T10:20:34Z) - Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors [23.325272595629773]
We redefine domains based on identities rather than datasets, aiming to disentangle liveness and identity attributes.
Our method achieves state-of-the-art performance under cross-dataset and limited source dataset scenarios.
arXiv Detail & Related papers (2024-07-11T07:39:58Z) - Hyperspectral Image Analysis in Single-Modal and Multimodal setting
using Deep Learning Techniques [1.2328446298523066]
Hyperspectral imaging provides precise classification for land use and cover due to its exceptional spectral resolution.
However, the challenges of high dimensionality and limited spatial resolution hinder its effectiveness.
This study addresses these challenges by employing deep learning techniques to efficiently process, extract features, and classify data in an integrated manner.
arXiv Detail & Related papers (2024-03-03T15:47:43Z) - Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning [0.46644955105516456]
Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes)
Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature space into the semantic space and the decoder reconstructs the original visual feature space.
We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space vectord with the semantic space into a latent representation space.
arXiv Detail & Related papers (2023-06-26T12:06:20Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain [52.783709712318405]
Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain.
We propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discnative information.
arXiv Detail & Related papers (2022-09-05T10:06:03Z) - Spatio-temporal Gait Feature with Adaptive Distance Alignment [90.5842782685509]
We try to increase the difference of gait features of different subjects from two aspects: the optimization of network structure and the refinement of extracted gait features.
Our method is proposed, it consists of Spatio-temporal Feature Extraction (SFE) and Adaptive Distance Alignment (ADA)
ADA uses a large number of unlabeled gait data in real life as a benchmark to refine the extracted-temporal features to make them have low inter-class similarity and high intra-class similarity.
arXiv Detail & Related papers (2022-03-07T13:34:00Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.