Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification
- URL: http://arxiv.org/abs/2401.01839v2
- Date: Thu, 4 Jan 2024 03:23:04 GMT
- Title: Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification
- Authors: Yulin Li, Tianzhu Zhang, Yongdong Zhang
- Abstract summary: We propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective.
Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) and the Phrase-Preserving Normalization (PPNorm)
- Score: 79.9402521412239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-infrared person re-identification (VI-ReID) is challenging due to the
significant cross-modality discrepancies between visible and infrared images.
While existing methods have focused on designing complex network architectures
or using metric learning constraints to learn modality-invariant features, they
often overlook which specific component of the image causes the modality
discrepancy problem. In this paper, we first reveal that the difference in the
amplitude component of visible and infrared images is the primary factor that
causes the modality discrepancy and further propose a novel Frequency Domain
modality-invariant feature learning framework (FDMNet) to reduce modality
discrepancy from the frequency domain perspective. Our framework introduces two
novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) module and
the Phrase-Preserving Normalization (PPNorm) module, to enhance the
modality-invariant amplitude component and suppress the modality-specific
component at both the image- and feature-levels. Extensive experimental results
on two standard benchmarks, SYSU-MM01 and RegDB, demonstrate the superior
performance of our FDMNet against state-of-the-art methods.
Related papers
- Frequency Domain Nuances Mining for Visible-Infrared Person
Re-identification [75.87443138635432]
Existing methods mainly exploit the spatial information while ignoring the discriminative frequency information.
We propose a novel Frequency Domain Nuances Mining (FDNM) method to explore the cross-modality frequency domain information.
Our method outperforms the second-best method by 5.2% in Rank-1 accuracy and 5.8% in mAP on the SYSU-MM01 dataset.
arXiv Detail & Related papers (2024-01-04T09:19:54Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - MRCN: A Novel Modality Restitution and Compensation Network for
Visible-Infrared Person Re-identification [36.88929785476334]
We propose a novel Modality Restitution and Compensation Network (MRCN) to narrow the gap between the two modalities.
Our method achieves 95.1% in terms of Rank-1 and 89.2% in terms of mAP on the RegDB dataset.
arXiv Detail & Related papers (2023-03-26T05:03:18Z) - Exploring Invariant Representation for Visible-Infrared Person
Re-Identification [77.06940947765406]
Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy.
In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM)
Experiment results on two standard cross-spectral person re-identification datasets, RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2023-02-02T05:24:50Z) - Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared
Person Re-Identification [84.32086702849338]
We propose a novel modality-adaptive mixup and invariant decomposition (MID) approach for RGB-infrared person re-identification.
MID designs a modality-adaptive mixup scheme to generate suitable mixed modality images between RGB and infrared images.
Experiments on two challenging benchmarks demonstrate superior performance of MID over state-of-the-art methods.
arXiv Detail & Related papers (2022-03-03T14:26:49Z) - SFANet: A Spectrum-aware Feature Augmentation Network for
Visible-Infrared Person Re-Identification [12.566284647658053]
We propose a novel spectrum-aware feature augementation network named SFANet for cross-modality matching problem.
Learning with grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations.
In feature-level, we improve the conventional two-stream network through balancing the number of specific and sharable convolutional blocks.
arXiv Detail & Related papers (2021-02-24T08:57:32Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.