Continuous 16-bit Training: Accelerating 32-bit Pre-Trained Neural
Networks
- URL: http://arxiv.org/abs/2311.18587v2
- Date: Fri, 1 Dec 2023 02:51:32 GMT
- Title: Continuous 16-bit Training: Accelerating 32-bit Pre-Trained Neural
Networks
- Authors: Juyoung Yun
- Abstract summary: This study introduces a novel approach where we continue the training of pre-existing 32-bit models using 16-bit precision.
By adopting 16-bit precision for ongoing training, we are able to substantially decrease memory requirements and computational burden.
Our experiments show that this method maintains the high standards of accuracy set by the original 32-bit training while providing a much-needed boost in training speed.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In the field of deep learning, the prevalence of models initially trained
with 32-bit precision is a testament to its robustness and accuracy. However,
the continuous evolution of these models often demands further training, which
can be resource-intensive. This study introduces a novel approach where we
continue the training of these pre-existing 32-bit models using 16-bit
precision. This technique not only caters to the need for efficiency in
computational resources but also significantly improves the speed of additional
training phases. By adopting 16-bit precision for ongoing training, we are able
to substantially decrease memory requirements and computational burden, thereby
accelerating the training process in a resource-limited setting. Our
experiments show that this method maintains the high standards of accuracy set
by the original 32-bit training while providing a much-needed boost in training
speed. This approach is especially pertinent in today's context, where most
models are initially trained in 32-bit and require periodic updates and
refinements. The findings from our research suggest that this strategy of
16-bit continuation training can be a key solution for sustainable and
efficient deep learning, offering a practical way to enhance pre-trained models
rapidly and in a resource-conscious manner.
Related papers
- Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Standalone 16-bit Training: Missing Study for Hardware-Limited Deep Learning Practitioners [2.075190620803526]
Mixed precision techniques leverage different numerical precisions during model training and inference to optimize resource usage.
For many with limited resources, the available options are restricted to using 32-bit, 16-bit, or a combination of the two.
This study fills a critical gap, proving for the first time that standalone 16-bit precision neural networks match 32-bit and mixed-precision in accuracy.
arXiv Detail & Related papers (2023-05-18T13:09:45Z) - Top-Tuning: a study on transfer learning for an efficient alternative to
fine tuning for image classification with fast kernel methods [12.325059377851485]
In this paper, we consider a simple transfer learning approach exploiting pre-trained convolutional features as input for a fast-to-train kernel method.
We show that the top-tuning approach provides comparable accuracy with respect to fine-tuning, with a training time between one and two orders of magnitude smaller.
arXiv Detail & Related papers (2022-09-16T13:46:59Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Revisiting BFloat16 Training [30.99618783594963]
State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision.
Deep learning accelerators are forced to support both 16-bit and 32-bit floating-point units.
arXiv Detail & Related papers (2020-10-13T05:38:07Z) - Multi-Precision Policy Enforced Training (MuPPET): A precision-switching
strategy for quantised fixed-point training of CNNs [13.83645579871775]
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks.
This work pushes the boundary of quantised training by employing a multilevel approach that utilises multiple precisions.
MuPPET achieves the same accuracy as standard full-precision training with training-time speedup of up to 1.84$times$ and an average speedup of 1.58$times$ across the networks.
arXiv Detail & Related papers (2020-06-16T10:14:36Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.