Cross-Modality Earth Mover's Distance for Visible Thermal Person
Re-Identification
- URL: http://arxiv.org/abs/2203.01675v1
- Date: Thu, 3 Mar 2022 12:26:59 GMT
- Title: Cross-Modality Earth Mover's Distance for Visible Thermal Person
Re-Identification
- Authors: Yongguo Ling, Zhun Zhong, Donglin Cao, Zhiming Luo, Yaojin Lin, Shaozi
Li, Nicu Sebe
- Abstract summary: Visible thermal person re-identification (VT-ReID) suffers from the inter-modality discrepancy and intra-identity variations.
We propose the Cross-Modality Earth Mover's Distance (CM-EMD) that can alleviate the impact of the intra-identity variations during modality alignment.
- Score: 82.01051164653583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visible thermal person re-identification (VT-ReID) suffers from the
inter-modality discrepancy and intra-identity variations. Distribution
alignment is a popular solution for VT-ReID, which, however, is usually
restricted to the influence of the intra-identity variations. In this paper, we
propose the Cross-Modality Earth Mover's Distance (CM-EMD) that can alleviate
the impact of the intra-identity variations during modality alignment. CM-EMD
selects an optimal transport strategy and assigns high weights to pairs that
have a smaller intra-identity variation. In this manner, the model will focus
on reducing the inter-modality discrepancy while paying less attention to
intra-identity variations, leading to a more effective modality alignment.
Moreover, we introduce two techniques to improve the advantage of CM-EMD.
First, the Cross-Modality Discrimination Learning (CM-DL) is designed to
overcome the discrimination degradation problem caused by modality alignment.
By reducing the ratio between intra-identity and inter-identity variances,
CM-DL leads the model to learn more discriminative representations. Second, we
construct the Multi-Granularity Structure (MGS), enabling us to align
modalities from both coarse- and fine-grained levels with the proposed CM-EMD.
Extensive experiments show the benefits of the proposed CM-EMD and its
auxiliary techniques (CM-DL and MGS). Our method achieves state-of-the-art
performance on two VT-ReID benchmarks.
Related papers
- Modality Unifying Network for Visible-Infrared Person Re-Identification [24.186989535051623]
Visible-infrared person re-identification (VI-ReID) is a challenging task due to large cross-modality discrepancies and intra-class variations.
Existing methods mainly focus on learning modality-shared representations by embedding different modalities into the same feature space.
We propose a novel Modality Unifying Network (MUN) to explore a robust auxiliary modality for VI-ReID.
arXiv Detail & Related papers (2023-09-12T14:22:22Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Learning Progressive Modality-shared Transformers for Effective
Visible-Infrared Person Re-identification [27.75907274034702]
We propose a novel deep learning framework named Progressive Modality-shared Transformer (PMT) for effective VI-ReID.
To reduce the negative effect of modality gaps, we first take the gray-scale images as an auxiliary modality and propose a progressive learning strategy.
To cope with the problem of large intra-class differences and small inter-class differences, we propose a Discriminative Center Loss.
arXiv Detail & Related papers (2022-12-01T02:20:16Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
Person Re-Identification [35.97494894205023]
RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality.
Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space.
We present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space.
arXiv Detail & Related papers (2021-10-21T16:45:23Z) - Cross-Modality Brain Tumor Segmentation via Bidirectional
Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme.
Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor.
The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z) - Margin Preserving Self-paced Contrastive Learning Towards Domain
Adaptation for Medical Image Segmentation [51.93711960601973]
We propose a novel margin preserving self-paced contrastive Learning model for cross-modal medical image segmentation.
With the guidance of progressively refined semantic prototypes, a novel margin preserving contrastive loss is proposed to boost the discriminability of embedded representation space.
Experiments on cross-modal cardiac segmentation tasks demonstrate that MPSCL significantly improves semantic segmentation performance.
arXiv Detail & Related papers (2021-03-15T15:23:10Z) - Dual Gaussian-based Variational Subspace Disentanglement for
Visible-Infrared Person Re-Identification [19.481092102536827]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems.
We present a dual Gaussian-based variational auto-encoder (DG-VAE) to disentangle an identity-discriminable and an identity-ambiguous cross-modality feature subspace.
Our method outperforms state-of-the-art methods on two VI-ReID datasets.
arXiv Detail & Related papers (2020-08-06T08:43:35Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.