Learning under Quantization for High-Dimensional Linear Regression
- URL: http://arxiv.org/abs/2510.18259v1
- Date: Tue, 21 Oct 2025 03:30:11 GMT
- Title: Learning under Quantization for High-Dimensional Linear Regression
- Authors: Dechen Zhang, Junwei Su, Difan Zou,
- Abstract summary: Low-bit quantization has emerged as an indispensable technique for enabling the efficient training of large-scale models.<n>Despite its widespread empirical success, a rigorous theoretical understanding of its impact on learning performance remains notably absent.<n>We present the first systematic theoretical study of this fundamental question, analyzing finite-step gradient descent (SGD) for high-dimensional linear regression.
- Score: 34.214978824165236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of low-bit quantization has emerged as an indispensable technique for enabling the efficient training of large-scale models. Despite its widespread empirical success, a rigorous theoretical understanding of its impact on learning performance remains notably absent, even in the simplest linear regression setting. We present the first systematic theoretical study of this fundamental question, analyzing finite-step stochastic gradient descent (SGD) for high-dimensional linear regression under a comprehensive range of quantization targets: data, labels, parameters, activations, and gradients. Our novel analytical framework establishes precise algorithm-dependent and data-dependent excess risk bounds that characterize how different quantization affects learning: parameter, activation, and gradient quantization amplify noise during training; data quantization distorts the data spectrum; and data and label quantization introduce additional approximation and quantized error. Crucially, we prove that for multiplicative quantization (with input-dependent quantization step), this spectral distortion can be eliminated, and for additive quantization (with constant quantization step), a beneficial scaling effect with batch size emerges. Furthermore, for common polynomial-decay data spectra, we quantitatively compare the risks of multiplicative and additive quantization, drawing a parallel to the comparison between FP and integer quantization methods. Our theory provides a powerful lens to characterize how quantization shapes the learning dynamics of optimization algorithms, paving the way to further explore learning theory under practical hardware constraints.
Related papers
- Scaling Laws for Precision in High-Dimensional Linear Regression [38.87908801454087]
We study scaling laws for low-precision training within a high-dimensional sketched linear regression framework.<n>By analyzing multiplicative and additive quantization, we identify a critical dichotomy in their scaling behaviors.<n>Our work provides a theoretical basis for optimizing training protocols under practical hardware constraints.
arXiv Detail & Related papers (2026-02-22T15:51:29Z) - High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator [7.837881800517111]
Quantized neural network training optimize a discrete, non-differentiable objective.<n>The straight-through estimator (STE) enables backpropagation through surrogate gradients.<n>We theoretically show that in the high-dimensional limit, STE dynamics converge to an ordinary deterministic differential equation.
arXiv Detail & Related papers (2025-10-12T16:43:46Z) - Training Dynamics Impact Post-Training Quantization Robustness [31.536101256063684]
Post-training quantization is widely adopted for efficient deployment of large language models.<n>We conduct a comprehensive analysis of quantization degradation across open-source language model training trajectories up to 32B parameters and 15T training tokens.
arXiv Detail & Related papers (2025-10-07T17:59:07Z) - Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization [2.8948274245812327]
This work presents the first finite-sample analysis of the straight-through estimator (STE) in the context of neural network quantization.<n>Specifically, by analyzing the quantization-aware training of a two-layer neural network with binary weights and activations, we derive the sample complexity bound in terms of the data dimensionality.<n>In the presence of label noises, we uncover an intriguing recurrence property of STE-gradient method, where the iterate repeatedly escape from and return to the optimal binary weights.
arXiv Detail & Related papers (2025-05-23T17:11:22Z) - QT-DoG: Quantization-aware Training for Domain Generalization [58.439816306817306]
We propose Quantization-aware Training for Domain Generalization (QT-DoG)<n>We demonstrate that weight quantization effectively leads to flatter minima in the loss landscape.<n> QT-DoG exploits quantization as an implicit regularizer by inducing noise in model weights.
arXiv Detail & Related papers (2024-10-08T13:21:48Z) - Efficient Learning for Linear Properties of Bounded-Gate Quantum Circuits [62.46800898243033]
Recent progress in quantum learning theory prompts a question: can linear properties of a large-qubit circuit be efficiently learned from measurement data generated by varying classical inputs?<n>We prove that the sample complexity scaling linearly in $d$ is required to achieve a small prediction error, while the corresponding computational complexity may scale exponentially in d.<n>We propose a kernel-based method leveraging classical shadows and truncated trigonometric expansions, enabling a controllable trade-off between prediction accuracy and computational overhead.
arXiv Detail & Related papers (2024-08-22T08:21:28Z) - Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners [51.32182730502002]
We introduce Singular-value Diagonal Expansion to refine weight distributions to achieve better quantization alignment.<n>Our plug-and-play weight-quantization methods demonstrate substantial performance improvements over state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-22T09:45:16Z) - Effect of Weight Quantization on Learning Models by Typical Case
Analysis [6.9060054915724]
The recent surge in data analysis scale has significantly increased computational resource requirements.
Quantization is vital for deploying large models on devices with limited computational resources.
arXiv Detail & Related papers (2024-01-30T18:58:46Z) - Towards Accurate Post-training Quantization for Diffusion Models [73.19871905102545]
We propose an accurate data-free post-training quantization framework of diffusion models (ADP-DM) for efficient image generation.
Our method outperforms the state-of-the-art post-training quantization of diffusion model by a sizable margin with similar computational cost.
arXiv Detail & Related papers (2023-05-30T04:00:35Z) - In-Hindsight Quantization Range Estimation for Quantized Training [5.65658124285176]
We propose a simple alternative to dynamic quantization, in-hindsight range estimation, that uses the quantization ranges estimated on previous iterations to quantize the present.
Our approach enables fast static quantization of gradients and activations while requiring only minimal hardware support from the neural network accelerator.
It is intended as a drop-in replacement for estimating quantization ranges and can be used in conjunction with other advances in quantized training.
arXiv Detail & Related papers (2021-05-10T10:25:28Z) - Quantum Algorithms for Data Representation and Analysis [68.754953879193]
We provide quantum procedures that speed-up the solution of eigenproblems for data representation in machine learning.
The power and practical use of these subroutines is shown through new quantum algorithms, sublinear in the input matrix's size, for principal component analysis, correspondence analysis, and latent semantic analysis.
Results show that the run-time parameters that do not depend on the input's size are reasonable and that the error on the computed model is small, allowing for competitive classification performances.
arXiv Detail & Related papers (2021-04-19T00:41:43Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization.
By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.