RGB-IR Cross-modality Person ReID based on Teacher-Student GAN Model
- URL: http://arxiv.org/abs/2007.07452v1
- Date: Wed, 15 Jul 2020 02:58:46 GMT
- Title: RGB-IR Cross-modality Person ReID based on Teacher-Student GAN Model
- Authors: Ziyue Zhang, Shuai Jiang, Congzhentao Huang, Yang Li and Richard Yi Da
Xu
- Abstract summary: We propose a Teacher-Student GAN model (TS-GAN) to adopt different domains and guide the ReID backbone to learn better ReID information.
Unlike other GAN based models, the proposed model only needs the backbone module at the test stage, making it more efficient and resource-saving.
- Score: 20.70796497371778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-Infrared (RGB-IR) person re-identification (ReID) is a technology where
the system can automatically identify the same person appearing at different
parts of a video when light is unavailable. The critical challenge of this task
is the cross-modality gap of features under different modalities. To solve this
challenge, we proposed a Teacher-Student GAN model (TS-GAN) to adopt different
domains and guide the ReID backbone to learn better ReID information. (1) In
order to get corresponding RGB-IR image pairs, the RGB-IR Generative
Adversarial Network (GAN) was used to generate IR images. (2) To kick-start the
training of identities, a ReID Teacher module was trained under IR modality
person images, which is then used to guide its Student counterpart in training.
(3) Likewise, to better adapt different domain features and enhance model ReID
performance, three Teacher-Student loss functions were used. Unlike other GAN
based models, the proposed model only needs the backbone module at the test
stage, making it more efficient and resource-saving. To showcase our model's
capability, we did extensive experiments on the newly-released SYSU-MM01 RGB-IR
Re-ID benchmark and achieved superior performance to the state-of-the-art with
49.8% Rank-1 and 47.4% mAP.
Related papers
- UniRGB-IR: A Unified Framework for RGB-Infrared Semantic Tasks via Adapter Tuning [17.36726475620881]
We propose a general and efficient framework called UniRGB-IR to unify RGB-IR semantic tasks.
A novel adapter is developed to efficiently introduce richer RGB-IR features into the pre-trained foundation model.
Experimental results on various RGB-IR downstream tasks demonstrate that our method can achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-04-26T12:21:57Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - Dynamic Enhancement Network for Partial Multi-modality Person
Re-identification [52.70235136651996]
We design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities.
Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner.
arXiv Detail & Related papers (2023-05-25T06:22:01Z) - CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets [50.6643933702394]
We present a single-model self-supervised hybrid pre-training framework for RGB and depth modalities, termed as CoMAE.
Our CoMAE presents a curriculum learning strategy to unify the two popular self-supervised representation learning algorithms: contrastive learning and masked image modeling.
arXiv Detail & Related papers (2023-02-13T07:09:45Z) - Multimodal Data Augmentation for Visual-Infrared Person ReID with
Corrupted Data [10.816003787786766]
We propose a specialized DA strategy for V-I person ReID models.
Our strategy allows to diminish the impact of corruption on the accuracy of deep person ReID models.
Results indicate that using our strategy, V-I ReID models can exploit both shared and individual modality knowledge.
arXiv Detail & Related papers (2022-11-22T00:29:55Z) - Students taught by multimodal teachers are superior action recognizers [41.821485757189656]
The focal point of egocentric video understanding is modelling hand-object interactions.
Standard models -- CNNs, Vision Transformers, etc. -- which receive RGB frames as input perform well, however, their performance improves further by employing additional modalities such as object detections, optical flow, audio, etc.
The goal of this work is to retain the performance of such multimodal approaches, while using only the RGB images as input at inference time.
arXiv Detail & Related papers (2022-10-09T19:37:17Z) - Teacher-Student Adversarial Depth Hallucination to Improve Face
Recognition [11.885178256393893]
We present the Teacher-Student Generative Adversarial Network (TS-GAN) to generate depth images from a single RGB image.
For our method to generalize well across unseen datasets, we design two components in the architecture, a teacher and a student.
The fully trained shared generator can then be used in runtime to hallucinate depth from RGB for downstream applications such as face recognition.
arXiv Detail & Related papers (2021-04-06T11:07:02Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - A Similarity Inference Metric for RGB-Infrared Cross-Modality Person
Re-identification [66.49212581685127]
Cross-modality person re-identification (re-ID) is a challenging task due to the large discrepancy between IR and RGB modalities.
Existing methods address this challenge typically by aligning feature distributions or image styles across modalities.
This paper presents a novel similarity inference metric (SIM) that exploits the intra-modality sample similarities to circumvent the cross-modality discrepancy.
arXiv Detail & Related papers (2020-07-03T05:28:13Z) - Cross-Spectrum Dual-Subspace Pairing for RGB-infrared Cross-Modality
Person Re-Identification [15.475897856494583]
Conventional person re-identification can only handle RGB color images, which will fail at dark conditions.
RGB-infrared ReID (also known as Infrared-Visible ReID or Visible-Thermal ReID) is proposed.
In this paper, a novel multi-spectrum image generation method is proposed and the generated samples are utilized to help the network to find discriminative information.
arXiv Detail & Related papers (2020-02-29T09:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.