Learning by Aligning: Visible-Infrared Person Re-identification using
Cross-Modal Correspondences
- URL: http://arxiv.org/abs/2108.07422v1
- Date: Tue, 17 Aug 2021 03:38:51 GMT
- Title: Learning by Aligning: Visible-Infrared Person Re-identification using
Cross-Modal Correspondences
- Authors: Hyunjong Park, Sanghoon Lee, Junghyup Lee, Bumsub Ham
- Abstract summary: Two main challenges in VI-reID are intra-class variations across person images, and cross-modal discrepancies between visible and infrared images.
We introduce a novel feature learning framework that addresses these problems in a unified way.
- Score: 42.16002082436691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of visible-infrared person re-identification
(VI-reID), that is, retrieving a set of person images, captured by visible or
infrared cameras, in a cross-modal setting. Two main challenges in VI-reID are
intra-class variations across person images, and cross-modal discrepancies
between visible and infrared images. Assuming that the person images are
roughly aligned, previous approaches attempt to learn coarse image- or rigid
part-level person representations that are discriminative and generalizable
across different modalities. However, the person images, typically cropped by
off-the-shelf object detectors, are not necessarily well-aligned, which
distract discriminative person representation learning. In this paper, we
introduce a novel feature learning framework that addresses these problems in a
unified way. To this end, we propose to exploit dense correspondences between
cross-modal person images. This allows to address the cross-modal discrepancies
in a pixel-level, suppressing modality-related features from person
representations more effectively. This also encourages pixel-wise associations
between cross-modal local features, further facilitating discriminative feature
learning for VI-reID. Extensive experiments and analyses on standard VI-reID
benchmarks demonstrate the effectiveness of our approach, which significantly
outperforms the state of the art.
Related papers
- Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images.
Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features.
We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z) - Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification [17.285526655788274]
Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities.
Existing methods generally try to bridge the cross-modal differences at image or feature level.
We introduce a dynamic identity-guided attention network (DIAN) to mine identity-guided and modality-consistent embeddings.
arXiv Detail & Related papers (2024-05-21T12:04:56Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Learning Commonality, Divergence and Variety for Unsupervised Visible-Infrared Person Re-identification [32.537029197752915]
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match specified people in infrared images to visible images without annotations, and vice versa.
Most existing methods address the USVI-ReID using cluster-based contrastive learning, which simply employs the cluster center as a representation of a person.
We propose a Progressive Contrastive Learning with Hard and Dynamic Prototypes method for USVI-ReID.
arXiv Detail & Related papers (2024-02-29T10:37:49Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for
Single Modality Labeled Visible-Infrared Person Re-identification [14.749167141971952]
Cross-modality data annotation is costly and error-prone for Visible-Infrared person re-identification.
We propose VI-Diff, a diffusion model that effectively addresses the task of Visible-Infrared person image translation.
Our approach can be a promising solution to the VI-ReID task with single-modality labeled data and serves as a good starting point for future study.
arXiv Detail & Related papers (2023-10-06T09:42:12Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Towards Homogeneous Modality Learning and Multi-Granularity Information
Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views.
Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data.
In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z) - Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition
using Unit-Class Loss and Cross-Modality Discriminator [0.43748379918040853]
We propose an end-to-end framework for cross-modal face recognition.
A novel Unit-Class Loss is proposed for preserving identity information while discarding modality information.
The proposed network can be used to extract modality-independent vector representations or a matching-pair classification for test images.
arXiv Detail & Related papers (2021-11-29T06:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.