Video-based Visible-Infrared Person Re-Identification with Auxiliary
Samples
- URL: http://arxiv.org/abs/2311.15571v1
- Date: Mon, 27 Nov 2023 06:45:22 GMT
- Title: Video-based Visible-Infrared Person Re-Identification with Auxiliary
Samples
- Authors: Yunhao Du, Cheng Lei, Zhicheng Zhao, Yuan Dong, Fei Su
- Abstract summary: Visible-infrared person re-identification (VI-ReID) aims to match persons captured by visible and infrared cameras.
Previous methods focus on learning from cross-modality person images in different cameras.
We first contribute a large-scale VI-ReID dataset named BUPTCampus.
- Score: 21.781628451676205
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visible-infrared person re-identification (VI-ReID) aims to match persons
captured by visible and infrared cameras, allowing person retrieval and
tracking in 24-hour surveillance systems. Previous methods focus on learning
from cross-modality person images in different cameras. However, temporal
information and single-camera samples tend to be neglected. To crack this nut,
in this paper, we first contribute a large-scale VI-ReID dataset named
BUPTCampus. Different from most existing VI-ReID datasets, it 1) collects
tracklets instead of images to introduce rich temporal information, 2) contains
pixel-aligned cross-modality sample pairs for better modality-invariant
learning, 3) provides one auxiliary set to help enhance the optimization, in
which each identity only appears in a single camera. Based on our constructed
dataset, we present a two-stream framework as baseline and apply Generative
Adversarial Network (GAN) to narrow the gap between the two modalities. To
exploit the advantages introduced by the auxiliary set, we propose a curriculum
learning based strategy to jointly learn from both primary and auxiliary sets.
Moreover, we design a novel temporal k-reciprocal re-ranking method to refine
the ranking list with fine-grained temporal correlation cues. Experimental
results demonstrate the effectiveness of the proposed methods. We also
reproduce 9 state-of-the-art image-based and video-based VI-ReID methods on
BUPTCampus and our methods show substantial superiority to them. The codes and
dataset are available at: https://github.com/dyhBUPT/BUPTCampus.
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification [39.58286453178339]
Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision.
We introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios.
This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels.
arXiv Detail & Related papers (2024-01-05T04:53:33Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Pose-Aided Video-based Person Re-Identification via Recurrent Graph
Convolutional Network [41.861537712563816]
We propose to learn the discriminative pose feature beyond the appearance feature for video retrieval.
To learn the pose feature, we first detect the pedestrian pose in each frame through an off-the-shelf pose detector.
We then exploit a recurrent graph convolutional network (RGCN) to learn the node embeddings of the temporal pose graph.
arXiv Detail & Related papers (2022-09-23T13:20:33Z) - Learning Modal-Invariant and Temporal-Memory for Video-based
Visible-Infrared Person Re-Identification [46.49866514866999]
We primarily study the video-based cross-modal person Re-ID method.
We prove that with the increase of frames in a tracklet, the performance does meet more enhancement.
A novel method is proposed, which projects two modalities to a modal-invariant subspace.
arXiv Detail & Related papers (2022-08-04T04:43:52Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Camera-Tracklet-Aware Contrastive Learning for Unsupervised Vehicle
Re-Identification [4.5471611558189124]
We propose camera-tracklet-aware contrastive learning (CTACL) using the multi-camera tracklet information without vehicle identity labels.
The proposed CTACL divides an unlabelled domain, i.e., entire vehicle images, into multiple camera-level images and conducts contrastive learning.
We demonstrate the effectiveness of our approach on video-based and image-based vehicle Re-ID datasets.
arXiv Detail & Related papers (2021-09-14T02:12:54Z) - Graph Convolution for Re-ranking in Person Re-identification [40.9727538382413]
We propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric.
A simple yet effective method is proposed to generate a profile vector for each tracklet in videos, which helps extend our method to video re-ID.
arXiv Detail & Related papers (2021-07-05T18:40:43Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Camera-aware Proxies for Unsupervised Person Re-Identification [60.26031011794513]
This paper tackles the purely unsupervised person re-identification (Re-ID) problem that requires no annotations.
We propose to split each single cluster into multiple proxies and each proxy represents the instances coming from the same camera.
Based on the camera-aware proxies, we design both intra- and inter-camera contrastive learning components for our Re-ID model.
arXiv Detail & Related papers (2020-12-19T12:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.