Related papers: How Low Can We Go: Trading Memory for Error in Low-Precision Training

How Low Can We Go: Trading Memory for Error in Low-Precision Training

URL: http://arxiv.org/abs/2106.09686v2
Date: Fri, 18 Jun 2021 04:55:09 GMT
Title: How Low Can We Go: Trading Memory for Error in Low-Precision Training
Authors: Chengrun Yang, Ziyang Wu, Jerry Chee, Christopher De Sa, Madeleine Udell
Abstract summary: Low-precision arithmetic trains deep learning models using less energy, less memory and less time. We pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. We borrow ideas from meta-learning to learn the tradeoff between memory and error.
Score: 52.94003953419242
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Low-precision arithmetic trains deep learning models using less energy, less memory and less time. However, we pay a price for the savings: lower precision may yield larger round-off error and hence larger prediction error. As applications proliferate, users must choose which precision to use to train a new model, and chip manufacturers must decide which precisions to manufacture. We view these precision choices as a hyperparameter tuning problem, and borrow ideas from meta-learning to learn the tradeoff between memory and error. In this paper, we introduce Pareto Estimation to Pick the Perfect Precision (PEPPP). We use matrix factorization to find non-dominated configurations (the Pareto frontier) with a limited number of network evaluations. For any given memory budget, the precision that minimizes error is a point on this frontier. Practitioners can use the frontier to trade memory for error and choose the best precision for their goals.

Related papers

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models [27.730213115659986]
Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. Traditional fine-tuning methods require backpropagation, which are error-prone in the low-precision settings. We propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision forward passes.
arXiv Detail & Related papers (2025-02-17T22:20:31Z)
Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training. We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training. Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Scaling Laws for Precision [73.24325358259753]
We devise "precision-aware" scaling laws for both training and inference. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data. For training, our scaling laws allow us to predict the loss of a model with different parts in different precisions.
arXiv Detail & Related papers (2024-11-07T00:10:10Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Collage: Light-Weight Low-Precision Strategy for LLM Training [21.190363633580233]
We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit.
arXiv Detail & Related papers (2024-05-06T16:55:30Z)
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error. We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Training Normalizing Flows with the Precision-Recall Divergence [73.92251251511199]
We show that achieving a specified precision-recall trade-off corresponds to minimising -divergences from a family we call the em PR-divergences We propose a novel generative model that is able to train a normalizing flow to minimise any -divergence, and in particular, achieve a given precision-recall trade-off.
arXiv Detail & Related papers (2023-02-01T17:46:47Z)
Training with Mixed-Precision Floating-Point Assignments [8.5323697848377]
We generate precision assignments for convolutional neural networks that use less memory. We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2023-01-31T08:01:35Z)
Towards Explainable Bit Error Tolerance of Resistive RAM-Based Binarized Neural Networks [7.349786872131006]
Non-volatile memory, such as resistive RAM (RRAM), is an emerging energy-efficient storage. Binary neural networks (BNNs) can tolerate a certain percentage of errors without a loss in accuracy. The bit error tolerance (BET) in BNNs can be achieved by flipping the weight signs during training.
arXiv Detail & Related papers (2020-02-03T17:38:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.