How Low Can We Go: Trading Memory for Error in Low-Precision Training
- URL: http://arxiv.org/abs/2106.09686v2
- Date: Fri, 18 Jun 2021 04:55:09 GMT
- Title: How Low Can We Go: Trading Memory for Error in Low-Precision Training
- Authors: Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine
Udell
- Abstract summary: Low-precision arithmetic trains deep learning models using less energy, less memory and less time.
We pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error.
We borrow ideas from meta-learning to learn the tradeoff between memory and error.
- Score: 52.94003953419242
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Low-precision arithmetic trains deep learning models using less energy, less
memory and less time. However, we pay a price for the savings: lower precision
may yield larger round-off error and hence larger prediction error. As
applications proliferate, users must choose which precision to use to train a
new model, and chip manufacturers must decide which precisions to manufacture.
We view these precision choices as a hyperparameter tuning problem, and borrow
ideas from meta-learning to learn the tradeoff between memory and error. In
this paper, we introduce Pareto Estimation to Pick the Perfect Precision
(PEPPP). We use matrix factorization to find non-dominated configurations (the
Pareto frontier) with a limited number of network evaluations. For any given
memory budget, the precision that minimizes error is a point on this frontier.
Practitioners can use the frontier to trade memory for error and choose the
best precision for their goals.
Related papers
- QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models [27.730213115659986]
Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference.
Traditional fine-tuning methods require backpropagation, which are error-prone in the low-precision settings.
We propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision forward passes.
arXiv Detail & Related papers (2025-02-17T22:20:31Z) - Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training.
We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training.
Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z) - Scaling Laws for Precision [73.24325358259753]
We devise "precision-aware" scaling laws for both training and inference.
For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data.
For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions.
arXiv Detail & Related papers (2024-11-07T00:10:10Z) - Collage: Light-Weight Low-Precision Strategy for LLM Training [21.190363633580233]
We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process.
We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted.
Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit.
arXiv Detail & Related papers (2024-05-06T16:55:30Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Training Normalizing Flows with the Precision-Recall Divergence [73.92251251511199]
We show that achieving a specified precision-recall trade-off corresponds to minimising -divergences from a family we call the em PR-divergences
We propose a novel generative model that is able to train a normalizing flow to minimise any -divergence, and in particular, achieve a given precision-recall trade-off.
arXiv Detail & Related papers (2023-02-01T17:46:47Z) - Training with Mixed-Precision Floating-Point Assignments [8.5323697848377]
We generate precision assignments for convolutional neural networks that use less memory.
We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2023-01-31T08:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.