How Low Can We Go: Trading Memory for Error in Low-Precision Training
- URL: http://arxiv.org/abs/2106.09686v2
- Date: Fri, 18 Jun 2021 04:55:09 GMT
- Title: How Low Can We Go: Trading Memory for Error in Low-Precision Training
- Authors: Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine
Udell
- Abstract summary: Low-precision arithmetic trains deep learning models using less energy, less memory and less time.
We pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error.
We borrow ideas from meta-learning to learn the tradeoff between memory and error.
- Score: 52.94003953419242
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Low-precision arithmetic trains deep learning models using less energy, less
memory and less time. However, we pay a price for the savings: lower precision
may yield larger round-off error and hence larger prediction error. As
applications proliferate, users must choose which precision to use to train a
new model, and chip manufacturers must decide which precisions to manufacture.
We view these precision choices as a hyperparameter tuning problem, and borrow
ideas from meta-learning to learn the tradeoff between memory and error. In
this paper, we introduce Pareto Estimation to Pick the Perfect Precision
(PEPPP). We use matrix factorization to find non-dominated configurations (the
Pareto frontier) with a limited number of network evaluations. For any given
memory budget, the precision that minimizes error is a point on this frontier.
Practitioners can use the frontier to trade memory for error and choose the
best precision for their goals.
Related papers
- Scaling Laws for Precision [73.24325358259753]
We devise "precision-aware" scaling laws for both training and inference.
For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data.
For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions.
arXiv Detail & Related papers (2024-11-07T00:10:10Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Collage: Light-Weight Low-Precision Strategy for LLM Training [21.190363633580233]
We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process.
We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted.
Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit.
arXiv Detail & Related papers (2024-05-06T16:55:30Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Training Normalizing Flows with the Precision-Recall Divergence [73.92251251511199]
We show that achieving a specified precision-recall trade-off corresponds to minimising -divergences from a family we call the em PR-divergences
We propose a novel generative model that is able to train a normalizing flow to minimise any -divergence, and in particular, achieve a given precision-recall trade-off.
arXiv Detail & Related papers (2023-02-01T17:46:47Z) - Training with Mixed-Precision Floating-Point Assignments [8.5323697848377]
We generate precision assignments for convolutional neural networks that use less memory.
We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2023-01-31T08:01:35Z) - Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized
Neural Networks [7.349786872131006]
Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy-efficient storage.
Binary neural networks (BNNs) can tolerate a certain percentage of errors without a loss in accuracy.
The bit error tolerance (BET) in BNNs can be achieved by flipping the weight signs during training.
arXiv Detail & Related papers (2020-02-03T17:38:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.