Automated Progressive Learning for Efficient Training of Vision
Transformers
- URL: http://arxiv.org/abs/2203.14509v1
- Date: Mon, 28 Mar 2022 05:37:08 GMT
- Title: Automated Progressive Learning for Efficient Training of Vision
Transformers
- Authors: Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun
Chang, Yi Yang
- Abstract summary: Vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs.
Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training.
In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning.
- Score: 125.22744987949227
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advances in vision Transformers (ViTs) have come with a voracious
appetite for computing power, high-lighting the urgent need to develop
efficient training methods for ViTs. Progressive learning, a training scheme
where the model capacity grows progressively during training, has started
showing its ability in efficient training. In this paper, we take a practical
step towards efficient training of ViTs by customizing and automating
progressive learning. First, we develop a strong manual baseline for
progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge
the gap brought by model growth. Then, we propose automated progressive
learning (AutoProg), an efficient training scheme that aims to achieve lossless
acceleration by automatically increasing the training overload on-the-fly; this
is achieved by adaptively deciding whether, where and how much should the model
grow during progressive learning. Specifically, we first relax the optimization
of the growth schedule to sub-network architecture optimization problem, then
propose one-shot estimation of the sub-network performance via an elastic
supernet. The searching overhead is reduced to minimal by recycling the
parameters of the supernet. Extensive experiments of efficient training on
ImageNet with two representative ViT models, DeiT and VOLO, demonstrate that
AutoProg can accelerate ViTs training by up to 85.1% with no performance drop.
Code: https://github.com/changlin31/AutoProg
Related papers
- A General and Efficient Training for Transformer via Token Expansion [44.002355107931805]
Vision Transformers (ViTs) typically require an extremely large training cost.
Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method with accuracy dropping.
We propose a novel token growth scheme Token Expansion (termed ToE) to achieve consistent training acceleration for ViTs.
arXiv Detail & Related papers (2024-03-31T12:44:24Z) - Efficient Stagewise Pretraining via Progressive Subnetworks [55.65819977062729]
We propose an alternative framework, progressive subnetwork training, that maintains the full model throughout training, but only trainsworks within the model in each step.
RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods.
arXiv Detail & Related papers (2024-02-08T18:49:09Z) - Local Masking Meets Progressive Freezing: Crafting Efficient Vision
Transformers for Self-Supervised Learning [0.0]
We present an innovative approach to self-supervised learning for Vision Transformers (ViTs)
This method focuses on enhancing the efficiency and speed of initial layer training in ViTs.
Our approach employs a novel multi-scale reconstruction process that fosters efficient learning in initial layers.
arXiv Detail & Related papers (2023-12-02T11:10:09Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Auto-scaling Vision Transformers without Training [84.34662535276898]
We propose As-ViT, an auto-scaling framework for Vision Transformers (ViTs) without training.
As-ViT automatically discovers and scales up ViTs in an efficient and principled manner.
As a unified framework, As-ViT achieves strong performance on classification and detection.
arXiv Detail & Related papers (2022-02-24T06:30:55Z) - EfficientNetV2: Smaller Models and Faster Training [91.77432224225221]
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.
We use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.
Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.
arXiv Detail & Related papers (2021-04-01T07:08:36Z) - Accelerating Reinforcement Learning for Reaching using Continuous
Curriculum Learning [6.703429330486276]
We focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching tasks.
Specifically, we propose a precision-based continuous curriculum learning (PCCL) method in which the requirements are gradually adjusted during the training process.
This approach is tested using a Universal Robot 5e in both simulation and real-world multi-goal reach experiments.
arXiv Detail & Related papers (2020-02-07T10:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.