Related papers: EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

URL: http://arxiv.org/abs/2211.09703v3
Date: Wed, 16 Aug 2023 15:16:43 GMT
Title: EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
Authors: Yulin Wang, Yang Yue, Rui Lu, Tianjiao Liu, Zhao Zhong, Shiji Song, Gao Huang
Abstract summary: This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
Score: 80.662250618795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is inspired by the inherent learning dynamics of deep networks: we experimentally show that at an earlier training stage, the model mainly learns to recognize some 'easier-to-learn' discriminative patterns within each example, e.g., the lower-frequency components of images and the original information before data augmentation. Driven by this phenomenon, we propose a curriculum where the model always leverages all the training data at each epoch, while the curriculum starts with only exposing the 'easier-to-learn' patterns of each example, and introduces gradually more difficult patterns. To implement this idea, we 1) introduce a cropping operation in the Fourier spectrum of the inputs, which enables the model to learn from only the lower-frequency components efficiently, 2) demonstrate that exposing the features of original images amounts to adopting weaker data augmentation, and 3) integrate 1) and 2) and design a curriculum learning schedule with a greedy-search algorithm. The resulting approach, EfficientTrain, is simple, general, yet surprisingly effective. As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, and CSWin) by >1.5x on ImageNet-1K/22K without sacrificing accuracy. It is also effective for self-supervised learning (e.g., MAE). Code is available at https://github.com/LeapLabTHU/EfficientTrain.

Related papers

Should VLMs be Pre-trained with Image Data? [54.50406730361859]
We find that pre-training with a mixture of image and text data allows models to perform better on vision-language tasks. On an average of 6 diverse tasks, we find that for a 1B model, introducing visual tokens 80% of the way through pre-training results in a 2% average improvement over introducing visual tokens to a fully pre-trained model.
arXiv Detail & Related papers (2025-03-10T17:58:19Z)
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function. We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation. The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z)
One-Shot Image Restoration [0.0]
Experimental results demonstrate the applicability, robustness and computational efficiency of the proposed approach for supervised image deblurring and super-resolution. Our results showcase significant improvement of learning models' sample efficiency, generalization and time complexity.
arXiv Detail & Related papers (2024-04-26T14:03:23Z)
Co-training $2^L$ Submodels for Visual Recognition [67.02999567435626]
Submodel co-training is a regularization method related to co-training, self-distillation and depth. We show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation.
arXiv Detail & Related papers (2022-12-09T14:38:09Z)
Learning Rate Curriculum [75.98230528486401]
We propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC) LeRaC uses a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. We compare our approach with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach.
arXiv Detail & Related papers (2022-05-18T18:57:36Z)
Continual Contrastive Self-supervised Learning for Image Classification [10.070132585425938]
Self-supervise learning method shows tremendous potential on visual representation without any labeled data at scale. To improve the visual representation of self-supervised learning, larger and more varied data is needed. In this paper, we make the first attempt to implement the continual contrastive self-supervised learning by proposing a rehearsal method.
arXiv Detail & Related papers (2021-07-05T03:53:42Z)
Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only. We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z)
Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper. Our method makes use of information from both intra- and inter-images. It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z)
Self-Supervised Training Enhances Online Continual Learning [37.91734641808391]
In continual learning, a system must incrementally learn from a non-stationary data stream without catastrophic forgetting. Self-supervised pre-training could yield features that generalize better than supervised learning. Our best system achieves a 14.95% relative increase in top-1 accuracy on class incremental ImageNet over the prior state of the art for online continual learning.
arXiv Detail & Related papers (2021-03-25T17:45:27Z)
Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL) In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion. We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.