Compute, Time and Energy Characterization of Encoder-Decoder Networks
with Automatic Mixed Precision Training
- URL: http://arxiv.org/abs/2008.08062v1
- Date: Tue, 18 Aug 2020 17:44:24 GMT
- Title: Compute, Time and Energy Characterization of Encoder-Decoder Networks
with Automatic Mixed Precision Training
- Authors: Siddharth Samsi, Michael Jones, Mark M. Veillette
- Abstract summary: We show that it is possible to achieve a significant improvement in training time by leveraging mixed-precision training without sacrificing model performance.
We find that a 1549% increase in the number of trainable parameters for a network comes at a relatively smaller 63.22% increase in energy usage for a UNet with 4 encoding layers.
- Score: 6.761235154230549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have shown great success in many diverse fields. The
training of these networks can take significant amounts of time, compute and
energy. As datasets get larger and models become more complex, the exploration
of model architectures becomes prohibitive. In this paper we examine the
compute, energy and time costs of training a UNet based deep neural network for
the problem of predicting short term weather forecasts (called precipitation
Nowcasting). By leveraging a combination of data distributed and
mixed-precision training, we explore the design space for this problem. We also
show that larger models with better performance come at a potentially
incremental cost if appropriate optimizations are used. We show that it is
possible to achieve a significant improvement in training time by leveraging
mixed-precision training without sacrificing model performance. Additionally,
we find that a 1549% increase in the number of trainable parameters for a
network comes at a relatively smaller 63.22% increase in energy usage for a
UNet with 4 encoding layers.
Related papers
- Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Analyzing and Improving the Training Dynamics of Diffusion Models [36.37845647984578]
We identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture.
We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity.
arXiv Detail & Related papers (2023-12-05T11:55:47Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - DNNAbacus: Toward Accurate Computational Cost Prediction for Deep Neural
Networks [0.9896984829010892]
This paper investigates the computational resource demands of 29 classical deep neural networks and builds accurate models for predicting computational costs.
We propose a lightweight prediction approach DNNAbacus with a novel network structural matrix for network representation.
Our experimental results show that the mean relative error (MRE) is 0.9% with respect to time and 2.8% with respect to memory for 29 classic models, which is much lower than the state-of-the-art works.
arXiv Detail & Related papers (2022-05-24T14:21:27Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - Improving Neural Networks for Time Series Forecasting using Data
Augmentation and AutoML [0.0]
This paper presents an easy to implement data augmentation method to significantly improve the performance of neural networks.
It shows that data augmentation, when paired Automated Machine Learning techniques such as Neural Architecture Search, can help to find the best neural architecture for a given time series.
arXiv Detail & Related papers (2021-03-02T19:20:49Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Weight Update Skipping: Reducing Training Time for Artificial Neural
Networks [0.30458514384586394]
We propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations.
During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting.
Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time.
arXiv Detail & Related papers (2020-12-05T15:12:10Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.