SFANet: A Spectrum-aware Feature Augmentation Network for
Visible-Infrared Person Re-Identification
- URL: http://arxiv.org/abs/2102.12137v1
- Date: Wed, 24 Feb 2021 08:57:32 GMT
- Title: SFANet: A Spectrum-aware Feature Augmentation Network for
Visible-Infrared Person Re-Identification
- Authors: Haojie Liu, Shun Ma, Daoxun Xia, and Shaozi Li
- Abstract summary: We propose a novel spectrum-aware feature augementation network named SFANet for cross-modality matching problem.
Learning with grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations.
In feature-level, we improve the conventional two-stream network through balancing the number of specific and sharable convolutional blocks.
- Score: 12.566284647658053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-Infrared person re-identification (VI-ReID) is a challenging matching
problem due to large modality varitions between visible and infrared images.
Existing approaches usually bridge the modality gap with only feature-level
constraints, ignoring pixel-level variations. Some methods employ GAN to
generate style-consistent images, but it destroys the structure information and
incurs a considerable level of noise. In this paper, we explicitly consider
these challenges and formulate a novel spectrum-aware feature augementation
network named SFANet for cross-modality matching problem. Specifically, we put
forward to employ grayscale-spectrum images to fully replace RGB images for
feature learning. Learning with the grayscale-spectrum images, our model can
apparently reduce modality discrepancy and detect inner structure relations
across the different modalities, making it robust to color variations. In
feature-level, we improve the conventional two-stream network through balancing
the number of specific and sharable convolutional blocks, which preserve the
spatial structure information of features. Additionally, a bi-directional
tri-constrained top-push ranking loss (BTTR) is embedded in the proposed
network to improve the discriminability, which efficiently further boosts the
matching accuracy. Meanwhile, we further introduce an effective dual-linear
with batch normalization ID embedding method to model the identity-specific
information and assits BTTR loss in magnitude stabilizing. On SYSU-MM01 and
RegDB datasets, we conducted extensively experiments to demonstrate that our
proposed framework contributes indispensably and achieves a very competitive
VI-ReID performance.
Related papers
- Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification [79.9402521412239]
We propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective.
Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) and the Phrase-Preserving Normalization (PPNorm)
arXiv Detail & Related papers (2024-01-03T17:11:27Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Exploring Invariant Representation for Visible-Infrared Person
Re-Identification [77.06940947765406]
Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy.
In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM)
Experiment results on two standard cross-spectral person re-identification datasets, RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2023-02-02T05:24:50Z) - CycleTrans: Learning Neutral yet Discriminative Features for
Visible-Infrared Person Re-Identification [79.84912525821255]
Visible-infrared person re-identification (VI-ReID) is a task of matching the same individuals across the visible and infrared modalities.
Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability.
We present a novel cycle-construction-based network for neutral yet discriminative feature learning, termed CycleTrans.
arXiv Detail & Related papers (2022-08-21T08:41:40Z) - Towards Homogeneous Modality Learning and Multi-Granularity Information
Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views.
Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data.
In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z) - CMTR: Cross-modality Transformer for Visible-infrared Person
Re-identification [38.96033760300123]
Cross-modality transformer-based method (CMTR) for visible-infrared person re-identification task.
We design the novel modality embeddings, which are fused with token embeddings to encode modalities' information.
Our proposed CMTR model's performance significantly surpasses existing outstanding CNN-based methods.
arXiv Detail & Related papers (2021-10-18T03:12:59Z) - Multi-Scale Cascading Network with Compact Feature Learning for
RGB-Infrared Person Re-Identification [35.55895776505113]
Multi-Scale Part-Aware Cascading framework (MSPAC) is formulated by aggregating multi-scale fine-grained features from part to global.
Cross-modality correlations can thus be efficiently explored on salient features for distinctive modality-invariant feature learning.
arXiv Detail & Related papers (2020-12-12T15:39:11Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z) - AdaptiveWeighted Attention Network with Camera Spectral Sensitivity
Prior for Spectral Reconstruction from RGB Images [22.26917280683572]
We propose a novel adaptive weighted attention network (AWAN) for spectral reconstruction.
AWCA and PSNL modules are developed to reallocate channel-wise feature responses.
In the NTIRE 2020 Spectral Reconstruction Challenge, our entries obtain the 1st ranking on the Clean track and the 3rd place on the Real World track.
arXiv Detail & Related papers (2020-05-19T09:21:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.