Related papers: Scaling Laws for Precision in High-Dimensional Linear Regression

Scaling Laws for Precision in High-Dimensional Linear Regression

URL: http://arxiv.org/abs/2602.19241v2
Date: Thu, 26 Feb 2026 08:08:52 GMT
Title: Scaling Laws for Precision in High-Dimensional Linear Regression
Authors: Dechen Zhang, Xuan Tang, Yingyu Liang, Difan Zou,
Abstract summary: We study scaling laws for low-precision training within a high-dimensional sketched linear regression framework.<n>By analyzing multiplicative and additive quantization, we identify a critical dichotomy in their scaling behaviors.<n>Our work provides a theoretical basis for optimizing training protocols under practical hardware constraints.
Score: 38.87908801454087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear regression framework. By analyzing multiplicative (signal-dependent) and additive (signal-independent) quantization, we identify a critical dichotomy in their scaling behaviors. Our analysis reveals that while both schemes introduce an additive error and degrade the effective data size, they exhibit distinct effects on effective model size: multiplicative quantization maintains the full-precision model size, whereas additive quantization reduces the effective model size. Numerical experiments validate our theoretical findings. By rigorously characterizing the complex interplay among model scale, dataset size, and quantization error, our work provides a principled theoretical basis for optimizing training protocols under practical hardware constraints.

Related papers

A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization [32.97211471008323]
We introduce the first theoretical framework of adaptive convergences, including Adam and Muon, under floating-point quantization of gradients, weights, and states.<n>We show that both algorithms retain convergence rates close to their full-precision counterparts provided mantissa length scales only logarithmically with the number of iterations.<n>Our analysis further reveals that Adam is highly sensitive to and second-moment quantization weights due to its reliance on $beta to 1$, while Muon requires weaker error control and is thus potentially more robust.
arXiv Detail & Related papers (2025-10-24T10:16:23Z)
Training Dynamics Impact Post-Training Quantization Robustness [31.536101256063684]
Post-training quantization is widely adopted for efficient deployment of large language models.<n>We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B parameters and 15T training tokens.
arXiv Detail & Related papers (2025-10-07T17:59:07Z)
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models [102.4511331368587]
ARISE (Adaptive Resolution-aware Scaling Evaluation) is a novel metric designed to assess the test-time scaling effectiveness of large reasoning models.<n>We conduct comprehensive experiments evaluating state-of-the-art reasoning models across diverse domains.
arXiv Detail & Related papers (2025-10-07T15:10:51Z)
Unified Scaling Laws for Compressed Representations [69.72517034565467]
We investigate whether a unified scaling framework can accurately predict model performance when training occurs over various compressed representations.<n>Our main finding is demonstrating both theoretically and empirically that there exists a simple "capacity" metric.<n>We extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats.
arXiv Detail & Related papers (2025-06-02T16:52:51Z)
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops [55.07063067759609]
High-quality data is essential for training large generative models, yet the vast reservoir of real data available online has become nearly depleted.<n>Models increasingly generate their own data for further training, forming Self-consuming Training Loops (STLs)<n>Some models degrade or even collapse, while others successfully avoid these failures, leaving a significant gap in theoretical understanding.
arXiv Detail & Related papers (2025-02-26T06:18:13Z)
Effective Interplay between Sparsity and Quantization: From Theory to Practice [33.697590845745815]
We show how sparsity and quantization interact when combined together.<n>We show that even if applied in the correct order, the compounded errors from sparsity and quantization can significantly harm accuracy.<n>Our findings extend to the efficient deployment of large models in resource-constrained compute platforms.
arXiv Detail & Related papers (2024-05-31T15:34:13Z)
Effect of Weight Quantization on Learning Models by Typical Case Analysis [6.9060054915724]
The recent surge in data analysis scale has significantly increased computational resource requirements. Quantization is vital for deploying large models on devices with limited computational resources.
arXiv Detail & Related papers (2024-01-30T18:58:46Z)
Enhancing Dynamical System Modeling through Interpretable Machine Learning Augmentations: A Case Study in Cathodic Electrophoretic Deposition [0.8796261172196743]
We introduce a comprehensive data-driven framework aimed at enhancing the modeling of physical systems. As a demonstrative application, we pursue the modeling of cathodic electrophoretic deposition (EPD), commonly known as e-coating.
arXiv Detail & Related papers (2024-01-16T14:58:21Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Quantized Adaptive Subgradient Algorithms and Their Applications [39.103587572626026]
We propose quantized composite mirror descent adaptive subgradient (QCMD adagrad) and quantized regularized dual average adaptive subgradient (QRDA adagrad) for distributed training. A quantized gradient-based adaptive learning rate matrix is constructed to achieve a balance between communication costs, accuracy, and model sparsity.
arXiv Detail & Related papers (2022-08-11T04:04:03Z)
Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features. We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.