Related papers: Efficient Conditional Pre-training for Transfer Learning

Efficient Conditional Pre-training for Transfer Learning

URL: http://arxiv.org/abs/2011.10231v5
Date: Thu, 18 Nov 2021 20:17:10 GMT
Title: Efficient Conditional Pre-training for Transfer Learning
Authors: Shuvam Chakraborty, Burak Uzkent, Kumar Ayush, Kumar Tanmay, Evan Sheehan, Stefano Ermon
Abstract summary: We propose efficient filtering methods to select relevant subsets from the pre-training dataset. We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings. We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
Score: 71.01129334495553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Almost all the state-of-the-art neural networks for computer vision tasks are trained by (1) pre-training on a large-scale dataset and (2) finetuning on the target dataset. This strategy helps reduce dependence on the target dataset and improves convergence rate and generalization on the target task. Although pre-training on large-scale datasets is very useful, its foremost disadvantage is high training cost. To address this, we propose efficient filtering methods to select relevant subsets from the pre-training dataset. Additionally, we discover that lowering image resolutions in the pre-training step offers a great trade-off between cost and performance. We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings and finetuning on a diverse collection of target datasets and tasks. Our proposed methods drastically reduce pre-training cost and provide strong performance boosts. Finally, we improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.

Related papers

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency [10.555957282859]
This paper introduces a novel dynamic bootstrapping dataset pruning method. It involves pruning data preparation followed by dataset mutation operations, both of which undergo iterative and dynamic updates. We individually pre-train seven CLIP models on two large-scale image-text pair datasets, and two MoCo models on the ImageNet dataset, resulting in a total of 16 pre-trained models.
arXiv Detail & Related papers (2024-11-14T01:53:17Z)
Data Filtering Networks [67.827994353269]
We study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset. Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks. Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets.
arXiv Detail & Related papers (2023-09-29T17:37:29Z)
The Role of Pre-training Data in Transfer Learning [20.768366728182997]
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance. We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
arXiv Detail & Related papers (2023-02-27T09:10:08Z)
SEPT: Towards Scalable and Efficient Visual Pre-Training [11.345844145289524]
Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
arXiv Detail & Related papers (2022-12-11T11:02:11Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [29.49873710927313]
We consider a self-supervised pre-training scenario that only leverages the target task data. Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data. On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
arXiv Detail & Related papers (2021-12-20T18:41:32Z)
Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications. Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy. We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID) Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance. This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
Rethinking Pre-training and Self-training [105.27954735761678]
We investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
arXiv Detail & Related papers (2020-06-11T23:59:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.