SEPT: Towards Scalable and Efficient Visual Pre-Training
- URL: http://arxiv.org/abs/2212.05473v1
- Date: Sun, 11 Dec 2022 11:02:11 GMT
- Title: SEPT: Towards Scalable and Efficient Visual Pre-Training
- Authors: Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li, Conghui
He, Lin Wang
- Abstract summary: Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance.
We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
- Score: 11.345844145289524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the self-supervised pre-training paradigm has shown great potential
in leveraging large-scale unlabeled data to improve downstream task
performance. However, increasing the scale of unlabeled pre-training data in
real-world scenarios requires prohibitive computational costs and faces the
challenge of uncurated samples. To address these issues, we build a
task-specific self-supervised pre-training framework from a data selection
perspective based on a simple hypothesis that pre-training on the unlabeled
samples with similar distribution to the target task can bring substantial
performance gains. Buttressed by the hypothesis, we propose the first yet novel
framework for Scalable and Efficient visual Pre-Training (SEPT) by introducing
a retrieval pipeline for data selection. SEPT first leverage a self-supervised
pre-trained model to extract the features of the entire unlabeled dataset for
retrieval pipeline initialization. Then, for a specific target task, SEPT
retrievals the most similar samples from the unlabeled dataset based on feature
similarity for each target instance for pre-training. Finally, SEPT pre-trains
the target model with the selected unlabeled samples in a self-supervised
manner for target data finetuning. By decoupling the scale of pre-training and
available upstream data for a target task, SEPT achieves high scalability of
the upstream dataset and high efficiency of pre-training, resulting in high
model architecture flexibility. Results on various downstream tasks demonstrate
that SEPT can achieve competitive or even better performance compared with
ImageNet pre-training while reducing the size of training samples by one
magnitude without resorting to any extra annotations.
Related papers
- Better with Less: A Data-Active Perspective on Pre-Training Graph Neural
Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data.
We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model.
Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z) - AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud
Dataset [25.935496432142976]
It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset.
We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data.
We achieve significant performance gains on a series of downstream perception benchmarks including nuScenes, and KITTI, under different baseline models.
arXiv Detail & Related papers (2023-06-01T12:32:52Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection [86.0580214485104]
We propose a general and efficient pre-training paradigm, Montage pre-training, for object detection.
Montage pre-training needs only the target detection dataset while taking only 1/4 computational resources compared to the widely adopted ImageNet pre-training.
The efficiency and effectiveness of Montage pre-training are validated by extensive experiments on the MS-COCO dataset.
arXiv Detail & Related papers (2020-04-25T16:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.