Causality-inspired Discriminative Feature Learning in Triple Domains for   Gait Recognition
        - URL: http://arxiv.org/abs/2407.12519v1
- Date: Wed, 17 Jul 2024 12:16:44 GMT
- Title: Causality-inspired Discriminative Feature Learning in Triple Domains for   Gait Recognition
- Authors: Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu, 
- Abstract summary: We propose CLTD, a discriminative feature learning module designed to eliminate the influence of confounders in triple domains, ie, spatial, temporal, and spectral.
Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains.
Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs.
- Score: 36.55724380184354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods. 
 
      
        Related papers
        - CAST: Cross-Attentive Spatio-Temporal feature fusion for Deepfake   detection [0.0]
 CNNs are effective at capturing spatial artifacts, and Transformers excel at modeling temporal inconsistencies.<n>We propose a unified CAST model that leverages cross-attention to effectively fuse spatial and temporal features.<n>We evaluate the performance of our model using the FaceForensics++, Celeb-DF, and DeepfakeDetection datasets.
 arXiv  Detail & Related papers  (2025-06-26T18:51:17Z)
- Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point   Cloud Recognition [63.55828203989405]
 We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
 arXiv  Detail & Related papers  (2025-06-26T11:53:59Z)
- Active Learning with Context Sampling and One-vs-Rest Entropy for   Semantic Segmentation [11.077512630548153]
 Multi-class semantic segmentation remains a cornerstone challenge in computer vision.
Active Learning (AL) mitigates this challenge by selecting data points for annotation strategically.
We present OREAL, a novel patch-based AL method designed for multi-class semantic segmentation.
 arXiv  Detail & Related papers  (2024-12-09T13:15:52Z)
- Frequency-Spatial Entanglement Learning for Camouflaged Object Detection [34.426297468968485]
 Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design.
We propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method.
Our experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets.
 arXiv  Detail & Related papers  (2024-09-03T07:58:47Z)
- Spatial-Temporal Cross-View Contrastive Pre-training for Check-in   Sequence Representation Learning [21.580705078081078]
 We propose a novel Spatial-Temporal Cross-view Contrastive Representation (ST CCR) framework for check-in sequence representation learning.
 ST CCR employs self-supervision from "spatial topic" and "temporal intention" views, facilitating effective fusion of spatial and temporal information at the semantic level.
We extensively evaluate ST CCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.
 arXiv  Detail & Related papers  (2024-07-22T10:20:34Z)
- Generalized Face Anti-spoofing via Finer Domain Partition and   Disentangling Liveness-irrelevant Factors [23.325272595629773]
 We redefine domains based on identities rather than datasets, aiming to disentangle liveness and identity attributes.
Our method achieves state-of-the-art performance under cross-dataset and limited source dataset scenarios.
 arXiv  Detail & Related papers  (2024-07-11T07:39:58Z)
- Hyperspectral Image Analysis in Single-Modal and Multimodal setting
  using Deep Learning Techniques [1.2328446298523066]
 Hyperspectral imaging provides precise classification for land use and cover due to its exceptional spectral resolution.
However, the challenges of high dimensionality and limited spatial resolution hinder its effectiveness.
This study addresses these challenges by employing deep learning techniques to efficiently process, extract features, and classify data in an integrated manner.
 arXiv  Detail & Related papers  (2024-03-03T15:47:43Z)
- Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual   Test-Time Adaptation [49.827306773992376]
 Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
 arXiv  Detail & Related papers  (2023-12-19T15:34:52Z)
- An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning [0.46644955105516456]
 Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes)
Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature space into the semantic space and the decoder reconstructs the original visual feature space.
We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space vectord with the semantic space into a latent representation space.
 arXiv  Detail & Related papers  (2023-06-26T12:06:20Z)
- Cluster-level pseudo-labelling for source-free cross-domain facial
  expression recognition [94.56304526014875]
 We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
 arXiv  Detail & Related papers  (2022-10-11T08:24:50Z)
- Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain [52.783709712318405]
 Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain.
We propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discnative information.
 arXiv  Detail & Related papers  (2022-09-05T10:06:03Z)
- Spatio-temporal Gait Feature with Adaptive Distance Alignment [90.5842782685509]
 We try to increase the difference of gait features of different subjects from two aspects: the optimization of network structure and the refinement of extracted gait features.
Our method is proposed, it consists of Spatio-temporal Feature Extraction (SFE) and Adaptive Distance Alignment (ADA)
ADA uses a large number of unlabeled gait data in real life as a benchmark to refine the extracted-temporal features to make them have low inter-class similarity and high intra-class similarity.
 arXiv  Detail & Related papers  (2022-03-07T13:34:00Z)
- Hierarchical Deep CNN Feature Set-Based Representation Learning for
  Robust Cross-Resolution Face Recognition [59.29808528182607]
 Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
 arXiv  Detail & Related papers  (2021-03-25T14:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.