Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for
Visible-Infrared Person Re-Identification
- URL: http://arxiv.org/abs/2307.08316v1
- Date: Mon, 17 Jul 2023 08:24:05 GMT
- Title: Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for
Visible-Infrared Person Re-Identification
- Authors: Tengfei Liang, Yi Jin, Wu Liu, Tao Wang, Songhe Feng, Yidong Li
- Abstract summary: Visible-Infrared person Re-IDentification (VI-ReID) aims to match pedestrians' images across visible and infrared cameras.
To solve the modality gap, existing mainstream methods adopt a learning paradigm converting the image retrieval task into an image classification task.
We propose a simple and effective method, the Multi-level Cross-modality Joint Alignment (MCJA), bridging both modality and objective-level gap.
- Score: 41.600294816284865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-Infrared person Re-IDentification (VI-ReID) is a challenging
cross-modality image retrieval task that aims to match pedestrians' images
across visible and infrared cameras. To solve the modality gap, existing
mainstream methods adopt a learning paradigm converting the image retrieval
task into an image classification task with cross-entropy loss and auxiliary
metric learning losses. These losses follow the strategy of adjusting the
distribution of extracted embeddings to reduce the intra-class distance and
increase the inter-class distance. However, such objectives do not precisely
correspond to the final test setting of the retrieval task, resulting in a new
gap at the optimization level. By rethinking these keys of VI-ReID, we propose
a simple and effective method, the Multi-level Cross-modality Joint Alignment
(MCJA), bridging both modality and objective-level gap. For the former, we
design the Modality Alignment Augmentation, which consists of three novel
strategies, the weighted grayscale, cross-channel cutmix, and spectrum jitter
augmentation, effectively reducing modality discrepancy in the image space. For
the latter, we introduce a new Cross-Modality Retrieval loss. It is the first
work to constrain from the perspective of the ranking list, aligning with the
goal of the testing stage. Moreover, based on the global feature only, our
method exhibits good performance and can serve as a strong baseline method for
the VI-ReID community.
Related papers
- Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.
We propose an effective approach to narrow the gap between the two domains.
It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Multi-task Learning for Optical Coherence Tomography Angiography (OCTA)
Vessel Segmentation [1.7539061565898157]
We propose a novel multi-task learning method for OCTA segmentation, called OCTA-MTL.
The adaptive loss combination strategy dynamically adjusts the loss weights according to the inverse of the average loss values of each task.
We evaluate our method on the ROSE-2 dataset its superiority in terms of segmentation performance against two baseline methods.
arXiv Detail & Related papers (2023-11-03T23:10:56Z) - Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID [56.573905143954015]
We propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters.
Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level.
Experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:27:46Z) - Exploring Invariant Representation for Visible-Infrared Person
Re-Identification [77.06940947765406]
Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy.
In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM)
Experiment results on two standard cross-spectral person re-identification datasets, RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2023-02-02T05:24:50Z) - Towards Homogeneous Modality Learning and Multi-Granularity Information
Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views.
Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data.
In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z) - Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared
Person Re-Identification [84.32086702849338]
We propose a novel modality-adaptive mixup and invariant decomposition (MID) approach for RGB-infrared person re-identification.
MID designs a modality-adaptive mixup scheme to generate suitable mixed modality images between RGB and infrared images.
Experiments on two challenging benchmarks demonstrate superior performance of MID over state-of-the-art methods.
arXiv Detail & Related papers (2022-03-03T14:26:49Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
Person Re-Identification [35.97494894205023]
RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality.
Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space.
We present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space.
arXiv Detail & Related papers (2021-10-21T16:45:23Z) - Parameter Sharing Exploration and Hetero-Center based Triplet Loss for
Visible-Thermal Person Re-Identification [17.402673438396345]
This paper focuses on the visible-thermal cross-modality person re-identification (VT Re-ID) task.
Our proposed method distinctly outperforms the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-08-14T07:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.