Related papers: SEPT: Towards Scalable and Efficient Visual Pre-Training

SEPT: Towards Scalable and Efficient Visual Pre-Training

URL: http://arxiv.org/abs/2212.05473v1
Date: Sun, 11 Dec 2022 11:02:11 GMT
Title: SEPT: Towards Scalable and Efficient Visual Pre-Training
Authors: Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li, Conghui He, Lin Wang
Abstract summary: Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
Score: 11.345844145289524
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, the self-supervised pre-training paradigm has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. However, increasing the scale of unlabeled pre-training data in real-world scenarios requires prohibitive computational costs and faces the challenge of uncurated samples. To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains. Buttressed by the hypothesis, we propose the first yet novel framework for Scalable and Efficient visual Pre-Training (SEPT) by introducing a retrieval pipeline for data selection. SEPT first leverage a self-supervised pre-trained model to extract the features of the entire unlabeled dataset for retrieval pipeline initialization. Then, for a specific target task, SEPT retrievals the most similar samples from the unlabeled dataset based on feature similarity for each target instance for pre-training. Finally, SEPT pre-trains the target model with the selected unlabeled samples in a self-supervised manner for target data finetuning. By decoupling the scale of pre-training and available upstream data for a target task, SEPT achieves high scalability of the upstream dataset and high efficiency of pre-training, resulting in high model architecture flexibility. Results on various downstream tasks demonstrate that SEPT can achieve competitive or even better performance compared with ImageNet pre-training while reducing the size of training samples by one magnitude without resorting to any extra annotations.

Related papers

Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data. We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model. Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z)
AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset [25.935496432142976]
It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data. We achieve significant performance gains on a series of downstream perception benchmarks including nuScenes, and KITTI, under different baseline models.
arXiv Detail & Related papers (2023-06-01T12:32:52Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications. Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy. We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID) Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance. This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset. We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings. We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection [86.0580214485104]
We propose a general and efficient pre-training paradigm, Montage pre-training, for object detection. Montage pre-training needs only the target detection dataset while taking only 1/4 computational resources compared to the widely adopted ImageNet pre-training. The efficiency and effectiveness of Montage pre-training are validated by extensive experiments on the MS-COCO dataset.
arXiv Detail & Related papers (2020-04-25T16:09:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.