Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification
- URL: http://arxiv.org/abs/2111.12084v1
- Date: Tue, 23 Nov 2021 18:59:08 GMT
- Title: Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification
- Authors: Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li,
Rong Jin
- Abstract summary: Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
- Score: 54.55281692768765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based supervised pre-training achieves great performance in
person re-identification (ReID). However, due to the domain gap between
ImageNet and ReID datasets, it usually needs a larger pre-training dataset
(e.g. ImageNet-21K) to boost the performance because of the strong data fitting
ability of the transformer. To address this challenge, this work targets to
mitigate the gap between the pre-training and ReID datasets from the
perspective of data and model structure, respectively. We first investigate
self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained
on unlabelled person images (the LUPerson dataset), and empirically find it
significantly surpasses ImageNet supervised pre-training models on ReID tasks.
To further reduce the domain gap and accelerate the pre-training, the
Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap between
pre-training and fine-tuning data. Based on CFS, a subset is selected via
sampling relevant data close to the down-stream ReID data and filtering
irrelevant data from the pre-training dataset. For the model structure, a
ReID-specific module named IBN-based convolution stem (ICS) is proposed to
bridge the domain gap by learning more invariant features. Extensive
experiments have been conducted to fine-tune the pre-training models under
supervised learning, unsupervised domain adaptation (UDA), and unsupervised
learning (USL) settings. We successfully downscale the LUPerson dataset to 50%
with no performance degradation. Finally, we achieve state-of-the-art
performance on Market-1501 and MSMT17. For example, our ViT-S/16 achieves
91.3%/89.9%/89.6% mAP accuracy on Market1501 for supervised/UDA/USL ReID. Codes
and models will be released to https://github.com/michuanhaohao/TransReID-SSL.
Related papers
- Data Filtering Networks [67.827994353269]
We study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset.
Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks.
Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets.
arXiv Detail & Related papers (2023-09-29T17:37:29Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud
Dataset [25.935496432142976]
It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset.
We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data.
We achieve significant performance gains on a series of downstream perception benchmarks including nuScenes, and KITTI, under different baseline models.
arXiv Detail & Related papers (2023-06-01T12:32:52Z) - The Role of Pre-training Data in Transfer Learning [20.768366728182997]
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance.
We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
arXiv Detail & Related papers (2023-02-27T09:10:08Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.