VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for
Single Modality Labeled Visible-Infrared Person Re-identification
- URL: http://arxiv.org/abs/2310.04122v1
- Date: Fri, 6 Oct 2023 09:42:12 GMT
- Title: VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for
Single Modality Labeled Visible-Infrared Person Re-identification
- Authors: Han Huang, Yan Huang, Liang Wang
- Abstract summary: Cross-modality data annotation is costly and error-prone for Visible-Infrared person re-identification.
We propose VI-Diff, a diffusion model that effectively addresses the task of Visible-Infrared person image translation.
Our approach can be a promising solution to the VI-ReID task with single-modality labeled data and serves as a good starting point for future study.
- Score: 14.749167141971952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visible-Infrared person re-identification (VI-ReID) in real-world scenarios
poses a significant challenge due to the high cost of cross-modality data
annotation. Different sensing cameras, such as RGB/IR cameras for good/poor
lighting conditions, make it costly and error-prone to identify the same person
across modalities. To overcome this, we explore the use of single-modality
labeled data for the VI-ReID task, which is more cost-effective and practical.
By labeling pedestrians in only one modality (e.g., visible images) and
retrieving in another modality (e.g., infrared images), we aim to create a
training set containing both originally labeled and modality-translated data
using unpaired image-to-image translation techniques. In this paper, we propose
VI-Diff, a diffusion model that effectively addresses the task of
Visible-Infrared person image translation. Through comprehensive experiments,
we demonstrate that VI-Diff outperforms existing diffusion and GAN models,
making it a promising solution for VI-ReID with single-modality labeled data.
Our approach can be a promising solution to the VI-ReID task with
single-modality labeled data and serves as a good starting point for future
study. Code will be available.
Related papers
- Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector [97.92369017531038]
We build a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR)
We then develop a novel iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of Visual Language Models (VLMs) to achieve the detection of adversarial images against benign ones in the input.
arXiv Detail & Related papers (2024-10-30T10:33:10Z) - Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification [39.70083261306122]
Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information.
In this paper, we first deduce an optimization objective for unsupervised VI-ReID based on the mutual information between the model's cross-modality input and output.
Under their guidance, we design a loop iterative training strategy alternating between model training and cross-modality matching.
arXiv Detail & Related papers (2024-07-17T17:32:07Z) - Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification [17.285526655788274]
Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities.
Existing methods generally try to bridge the cross-modal differences at image or feature level.
We introduce a dynamic identity-guided attention network (DIAN) to mine identity-guided and modality-consistent embeddings.
arXiv Detail & Related papers (2024-05-21T12:04:56Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement [53.044703127757295]
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset.
We propose a Dual Optimal Transport Label Assignment (DOTLA) framework to simultaneously assign the generated labels from one modality to its counterpart modality.
The proposed DOTLA mechanism formulates a mutual reinforcement and efficient solution to cross-modality data association, which could effectively reduce the side-effects of some insufficient and noisy label associations.
arXiv Detail & Related papers (2023-05-22T04:40:30Z) - Diverse Embedding Expansion Network and Low-Light Cross-Modality
Benchmark for Visible-Infrared Person Re-identification [26.71900654115498]
We propose a novel augmentation network in the embedding space, called diverse embedding expansion network (DEEN)
The proposed DEEN can effectively generate diverse embeddings to learn the informative feature representations.
We provide a low-light cross-modality (LLCM) dataset, which contains 46,767 bounding boxes of 1,064 identities captured by 9 RGB/IR cameras.
arXiv Detail & Related papers (2023-03-25T14:24:56Z) - Learning Feature Recovery Transformer for Occluded Person
Re-identification [71.18476220969647]
We propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously.
To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity.
In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its $k$-nearest neighbors in the gallery to recover the complete features.
arXiv Detail & Related papers (2023-01-05T02:36:16Z) - CycleTrans: Learning Neutral yet Discriminative Features for
Visible-Infrared Person Re-Identification [79.84912525821255]
Visible-infrared person re-identification (VI-ReID) is a task of matching the same individuals across the visible and infrared modalities.
Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability.
We present a novel cycle-construction-based network for neutral yet discriminative feature learning, termed CycleTrans.
arXiv Detail & Related papers (2022-08-21T08:41:40Z) - Towards Homogeneous Modality Learning and Multi-Granularity Information
Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views.
Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data.
In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Learning by Aligning: Visible-Infrared Person Re-identification using
Cross-Modal Correspondences [42.16002082436691]
Two main challenges in VI-reID are intra-class variations across person images, and cross-modal discrepancies between visible and infrared images.
We introduce a novel feature learning framework that addresses these problems in a unified way.
arXiv Detail & Related papers (2021-08-17T03:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.