How Low Can We Go: Trading Memory for Error in Low-Precision Training
- URL: http://arxiv.org/abs/2106.09686v2
- Date: Fri, 18 Jun 2021 04:55:09 GMT
- Title: How Low Can We Go: Trading Memory for Error in Low-Precision Training
- Authors: Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine
Udell
- Abstract summary: Low-precision arithmetic trains deep learning models using less energy, less memory and less time.
We pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error.
We borrow ideas from meta-learning to learn the tradeoff between memory and error.
- Score: 52.94003953419242
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Low-precision arithmetic trains deep learning models using less energy, less
memory and less time. However, we pay a price for the savings: lower precision
may yield larger round-off error and hence larger prediction error. As
applications proliferate, users must choose which precision to use to train a
new model, and chip manufacturers must decide which precisions to manufacture.
We view these precision choices as a hyperparameter tuning problem, and borrow
ideas from meta-learning to learn the tradeoff between memory and error. In
this paper, we introduce Pareto Estimation to Pick the Perfect Precision
(PEPPP). We use matrix factorization to find non-dominated configurations (the
Pareto frontier) with a limited number of network evaluations. For any given
memory budget, the precision that minimizes error is a point on this frontier.
Practitioners can use the frontier to trade memory for error and choose the
best precision for their goals.
Related papers
- Collage: Light-Weight Low-Precision Strategy for LLM Training [21.190363633580233]
We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process.
We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted.
Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit.
arXiv Detail & Related papers (2024-05-06T16:55:30Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of
Language Model [92.55145016562867]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales [65.01417261415833]
We present an approach to predict the pre-training loss based on our observations that Maximal Update Parametrization (muP) enables accurate fitting of scaling laws.
With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B.
Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models.
arXiv Detail & Related papers (2023-04-14T00:45:01Z) - Training Normalizing Flows with the Precision-Recall Divergence [73.92251251511199]
We show that achieving a specified precision-recall trade-off corresponds to minimising -divergences from a family we call the em PR-divergences
We propose a novel generative model that is able to train a normalizing flow to minimise any -divergence, and in particular, achieve a given precision-recall trade-off.
arXiv Detail & Related papers (2023-02-01T17:46:47Z) - Training with Mixed-Precision Floating-Point Assignments [8.5323697848377]
We generate precision assignments for convolutional neural networks that use less memory.
We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2023-01-31T08:01:35Z) - The case for 4-bit precision: k-bit Inference Scaling Laws [75.4335600212427]
Quantization methods reduce the number of bits required to represent each parameter in a model.
The final model size depends on both the number of parameters of the original model and the rate of compression.
We run more than 35,000 zero-shot experiments with 16-bit inputs and k-bit parameters to examine which quantization methods improve scaling for 3 to 8-bit precision.
arXiv Detail & Related papers (2022-12-19T18:48:33Z) - Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized
Neural Networks [7.349786872131006]
Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy-efficient storage.
Binary neural networks (BNNs) can tolerate a certain percentage of errors without a loss in accuracy.
The bit error tolerance (BET) in BNNs can be achieved by flipping the weight signs during training.
arXiv Detail & Related papers (2020-02-03T17:38:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.