Related papers: Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

URL: http://arxiv.org/abs/2112.10740v1
Date: Mon, 20 Dec 2021 18:41:32 GMT
Title: Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Authors: Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Herv\'e Jegou, Edouard Grave
Abstract summary: We consider a self-supervised pre-training scenario that only leverages the target task data. Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data. On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
Score: 29.49873710927313
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are order(s) of magnitude smaller than Imagenet. Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings.We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains. On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.

Related papers

Enhancing pretraining efficiency for medical image segmentation via transferability metrics [0.0]
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge. We introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data.
arXiv Detail & Related papers (2024-10-24T12:11:52Z)
Effective pruning of web-scale datasets based on complexity of concept clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet. We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z)
A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z)
Self-Supervised Pretraining for 2D Medical Image Segmentation [0.0]
Self-supervised learning offers a way to lower the need for manually annotated data by pretraining models for a specific domain on unlabelled data. We find that self-supervised pretraining on natural images and target-domain-specific images leads to the fastest and most stable downstream convergence. In low-data scenarios, supervised ImageNet pretraining achieves the best accuracy, requiring less than 100 annotated samples to realise close to minimal error.
arXiv Detail & Related papers (2022-09-01T09:25:22Z)
Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images. MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z)
On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision. We present an in-depth study of image pre-training. We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z)
ImageNet-21K Pretraining for the Masses [12.339884639594624]
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset contains more pictures and classes. This paper aims to make high-quality efficient pretraining on ImageNet-21K available for everyone.
arXiv Detail & Related papers (2021-04-22T10:10:14Z)
Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z)
Efficient Visual Pretraining with Contrastive Detection [31.444554574326283]
We introduce a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations. This objective extracts a rich learning signal per image, leading to state-of-the-art transfer performance from ImageNet to COCO. In particular, our strongest ImageNet-pretrained model performs on par with SEER, one of the largest self-supervised systems to date.
arXiv Detail & Related papers (2021-03-19T14:05:12Z)
Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset. We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings. We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z)
Rethinking Pre-training and Self-training [105.27954735761678]
We investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
arXiv Detail & Related papers (2020-06-11T23:59:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.