ImageNet-21K Pretraining for the Masses
- URL: http://arxiv.org/abs/2104.10972v1
- Date: Thu, 22 Apr 2021 10:10:14 GMT
- Title: ImageNet-21K Pretraining for the Masses
- Authors: Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelnik-Manor
- Abstract summary: ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks.
ImageNet-21K dataset contains more pictures and classes.
This paper aims to make high-quality efficient pretraining on ImageNet-21K available for everyone.
- Score: 12.339884639594624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ImageNet-1K serves as the primary dataset for pretraining deep learning
models for computer vision tasks. ImageNet-21K dataset, which contains more
pictures and classes, is used less frequently for pretraining, mainly due to
its complexity, and underestimation of its added value compared to standard
ImageNet-1K pretraining. This paper aims to close this gap, and make
high-quality efficient pretraining on ImageNet-21K available for everyone. %
Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel
training scheme called semantic softmax, we show that various models, including
small mobile-oriented models, significantly benefit from ImageNet-21K
pretraining on numerous datasets and tasks. We also show that we outperform
previous ImageNet-21K pretraining schemes for prominent new models like ViT. %
Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA
reproducible results, from a publicly available dataset. The training code and
pretrained models are available at: https://github.com/Alibaba-MIIL/ImageNet21K
Related papers
- Effective pruning of web-scale datasets based on complexity of concept
clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet.
We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs.
We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z) - Large-scale Dataset Pruning with Dynamic Uncertainty [28.60845105174658]
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them.
In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop.
arXiv Detail & Related papers (2023-06-08T13:14:35Z) - The effectiveness of MAE pre-pretraining for billion-scale pretraining [65.98338857597935]
We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model.
We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition.
arXiv Detail & Related papers (2023-03-23T17:56:12Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Core Risk Minimization using Salient ImageNet [53.616101711801484]
We introduce the Salient Imagenet dataset with more than 1 million soft masks localizing core and spurious features for all 1000 Imagenet classes.
Using this dataset, we first evaluate the reliance of several Imagenet pretrained models (42 total) on spurious features.
Next, we introduce a new learning paradigm called Core Risk Minimization (CoRM) whose objective ensures that the model predicts a class using its core features.
arXiv Detail & Related papers (2022-03-28T01:53:34Z) - Corrupted Image Modeling for Self-Supervised Visual Pre-Training [103.99311611776697]
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training.
CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial mask tokens.
After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks.
arXiv Detail & Related papers (2022-02-07T17:59:04Z) - Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [29.49873710927313]
We consider a self-supervised pre-training scenario that only leverages the target task data.
Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data.
On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
arXiv Detail & Related papers (2021-12-20T18:41:32Z) - Learning Transferable Visual Models From Natural Language Supervision [13.866297967166089]
Learning directly from raw text about images is a promising alternative.
We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn.
SOTA image representations are learned from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
arXiv Detail & Related papers (2021-02-26T19:04:58Z) - Rethinking Pre-training and Self-training [105.27954735761678]
We investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training.
Our study reveals the generality and flexibility of self-training with three additional insights.
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
arXiv Detail & Related papers (2020-06-11T23:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.