Related papers: Automated Progressive Learning for Efficient Training of Vision Transformers

Automated Progressive Learning for Efficient Training of Vision Transformers

URL: http://arxiv.org/abs/2203.14509v1
Date: Mon, 28 Mar 2022 05:37:08 GMT
Title: Automated Progressive Learning for Efficient Training of Vision Transformers
Authors: Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang
Abstract summary: Vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs. Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training. In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning.
Score: 125.22744987949227
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs. Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training. In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning. First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth. Then, we propose automated progressive learning (AutoProg), an efficient training scheme that aims to achieve lossless acceleration by automatically increasing the training overload on-the-fly; this is achieved by adaptively deciding whether, where and how much should the model grow during progressive learning. Specifically, we first relax the optimization of the growth schedule to sub-network architecture optimization problem, then propose one-shot estimation of the sub-network performance via an elastic supernet. The searching overhead is reduced to minimal by recycling the parameters of the supernet. Extensive experiments of efficient training on ImageNet with two representative ViT models, DeiT and VOLO, demonstrate that AutoProg can accelerate ViTs training by up to 85.1% with no performance drop. Code: https://github.com/changlin31/AutoProg

Related papers

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning [14.099306230721245]
We present VLA-RL, an exploration-based framework that improves on online collected data at test time.<n>We fine-tune a pretrained vision-language model as a robotic process reward model, which is trained on pseudo reward labels annotated on automatically extracted task segments.<n>VLA-RL enables OpenVLA-7B to surpass the strongest finetuned baseline by 4.5% on 40 challenging robotic manipulation tasks in LIBERO.
arXiv Detail & Related papers (2025-05-24T14:42:51Z)
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design [79.7289790249621]
Our proposed method, T2V-Turbo-v2, introduces a significant advancement by integrating various supervision signals. We highlight the crucial importance of tailoring datasets to specific learning objectives. We demonstrate the potential of this approach by extracting motion guidance from the training datasets and incorporating it into the ODE solver.
arXiv Detail & Related papers (2024-10-08T04:30:06Z)
Efficient Training of Large Vision Models via Advanced Automated Progressive Learning [96.71646528053651]
We present an advanced automated progressive learning (AutoProg) framework for efficient training of Large Vision Models (LVMs) We introduce AutoProg-Zero, by enhancing the AutoProg framework with a novel zero-shot unfreezing schedule search. Experiments show that AutoProg accelerates ViT pre-training by up to 1.85x on ImageNet and accelerates fine-tuning of diffusion models by up to 2.86x, with comparable or even higher performance.
arXiv Detail & Related papers (2024-09-06T16:24:24Z)
A General and Efficient Training for Transformer via Token Expansion [44.002355107931805]
Vision Transformers (ViTs) typically require an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method with accuracy dropping. We propose a novel token growth scheme Token Expansion (termed ToE) to achieve consistent training acceleration for ViTs.
arXiv Detail & Related papers (2024-03-31T12:44:24Z)
Efficient Stagewise Pretraining via Progressive Subnetworks [53.00045381931778]
The prevailing view suggests that stagewise dropping strategies, such as layer dropping, are ineffective when compared to stacking-based approaches. This paper challenges this notion by demonstrating that, with proper design, dropping strategies can be competitive, if not better, than stacking methods. We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork at each step, progressively increasing the size in stages.
arXiv Detail & Related papers (2024-02-08T18:49:09Z)
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning [0.0]
We present an innovative approach to self-supervised learning for Vision Transformers (ViTs) This method focuses on enhancing the efficiency and speed of initial layer training in ViTs. Our approach employs a novel multi-scale reconstruction process that fosters efficient learning in initial layers.
arXiv Detail & Related papers (2023-12-02T11:10:09Z)
Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents. We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead. Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z)
Auto-scaling Vision Transformers without Training [84.34662535276898]
We propose As-ViT, an auto-scaling framework for Vision Transformers (ViTs) without training. As-ViT automatically discovers and scales up ViTs in an efficient and principled manner. As a unified framework, As-ViT achieves strong performance on classification and detection.
arXiv Detail & Related papers (2022-02-24T06:30:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.