Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification
- URL: http://arxiv.org/abs/2406.14261v1
- Date: Thu, 20 Jun 2024 12:30:12 GMT
- Title: Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification
- Authors: Nanxing Meng, Qizao Wang, Bin Li, Xiangyang Xue,
- Abstract summary: We propose the Self-Supervised Refined Clustering (SSR-C) framework to promote unsupervised video person re-identification.
Our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods.
- Score: 40.83058938096914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially partition the noise-filtered tracklets into "sub-tracklets". Then, we cluster and further merge sub-tracklets using the self-supervised signal from tracklet partition, which is enhanced through a progressive strategy to generate reliable pseudo labels, facilitating intra-class cross-tracklet aggregation. Moreover, we propose the Class Smoothing Classification (CSC) loss to efficiently promote model learning. Extensive experiments on the MARS and DukeMTMC-VideoReID datasets demonstrate that our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods.
Related papers
- Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting [126.75545291243142]
Video Individual Counting aims to predict the number of unique individuals in a single video.
We introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
arXiv Detail & Related papers (2023-12-10T16:12:13Z) - A Free Lunch to Person Re-identification: Learning from Automatically
Generated Noisy Tracklets [52.30547023041587]
unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets.
But their performance is still far lower than the supervised counterparts.
In this paper, we propose to tackle this problem by learning re-ID models from automatically generated person tracklets.
arXiv Detail & Related papers (2022-04-02T16:18:13Z) - Semi-TCL: Semi-Supervised Track Contrastive Representation Learning [40.31083437957288]
We design a new instance-to-track matching objective to learn appearance embedding.
It compares a candidate detection to the embedding of the tracks persisted in the tracker.
We implement this learning objective in a unified form following the spirit of constrastive loss.
arXiv Detail & Related papers (2021-07-06T05:23:30Z) - Video-based Person Re-identification without Bells and Whistles [49.51670583977911]
Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras.
There exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods.
We present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets.
arXiv Detail & Related papers (2021-05-22T10:17:38Z) - Unsupervised Noisy Tracklet Person Re-identification [100.85530419892333]
We present a novel selective tracklet learning (STL) approach that can train discriminative person re-id models from unlabelled tracklet data.
This avoids the tedious and costly process of exhaustively labelling person image/tracklet true matching pairs across camera views.
Our method is particularly more robust against arbitrary noisy data of raw tracklets therefore scalable to learning discriminative models from unconstrained tracking data.
arXiv Detail & Related papers (2021-01-16T07:31:00Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.