Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1
Accuracy in One Hour
- URL: http://arxiv.org/abs/2011.00071v2
- Date: Thu, 5 Nov 2020 02:17:22 GMT
- Title: Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1
Accuracy in One Hour
- Authors: Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le,
Yang You, Sameer Kumar
- Abstract summary: We present techniques to scale up the training of EfficientNets on TPU-v3 Pods with 2048 cores.
We are able to train EfficientNet on ImageNet to an accuracy of 83% in 1 hour and 4 minutes.
- Score: 38.89981855438478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: EfficientNets are a family of state-of-the-art image classification models
based on efficiently scaled convolutional neural networks. Currently,
EfficientNets can take on the order of days to train; for example, training an
EfficientNet-B0 model takes 23 hours on a Cloud TPU v2-8 node. In this paper,
we explore techniques to scale up the training of EfficientNets on TPU-v3 Pods
with 2048 cores, motivated by speedups that can be achieved when training at
such scales. We discuss optimizations required to scale training to a batch
size of 65536 on 1024 TPU-v3 cores, such as selecting large batch optimizers
and learning rate schedules as well as utilizing distributed evaluation and
batch normalization techniques. Additionally, we present timing and performance
benchmarks for EfficientNet models trained on the ImageNet dataset in order to
analyze the behavior of EfficientNets at scale. With our optimizations, we are
able to train EfficientNet on ImageNet to an accuracy of 83% in 1 hour and 4
minutes.
Related papers
- Effective pruning of web-scale datasets based on complexity of concept
clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet.
We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs.
We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z) - FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet
Level [7.410940271545853]
We present FastHebb, an efficient and scalable solution for Hebbian learning.
FastHebb outperforms previous solutions by up to 50 times in terms of training speed.
For the first time, we are able to bring Hebbian algorithms to ImageNet scale.
arXiv Detail & Related papers (2022-07-07T09:04:55Z) - Optimization Planning for 3D ConvNets [123.43419144051703]
It is not trivial to optimally learn a 3D Convolutional Neural Networks (3D ConvNets) due to high complexity and various options of the training scheme.
We decompose the path into a series of training "states" and specify the hyper- parameters, e.g., learning rate and the length of input clips, in each state.
We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path.
arXiv Detail & Related papers (2022-01-11T16:13:31Z) - Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images.
When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z) - EfficientNetV2: Smaller Models and Faster Training [91.77432224225221]
This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models.
We use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.
Our experiments show that EfficientNetV2 models train much faster than state-of-the-art models while being up to 6.8x smaller.
arXiv Detail & Related papers (2021-04-01T07:08:36Z) - TResNet: High Performance GPU-Dedicated Architecture [6.654949459658242]
Many deep learning models, developed in recent years, reach higher ImageNet accuracy than ResNet50, with fewer or comparable FLOPS count.
In this work, we introduce a series of architecture modifications that aim to boost neural networks' accuracy, while retaining their GPU training and inference efficiency.
We introduce a new family of GPU-dedicated models, called TResNet, which achieve better accuracy and efficiency than previous ConvNets.
arXiv Detail & Related papers (2020-03-30T17:04:47Z) - Fixing the train-test resolution discrepancy: FixEfficientNet [98.64315617109344]
This paper provides an analysis of the performance of the EfficientNet image classifiers with several recent training procedures.
The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters.
arXiv Detail & Related papers (2020-03-18T14:22:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.