Effect of Weight Quantization on Learning Models by Typical Case
Analysis
- URL: http://arxiv.org/abs/2401.17269v1
- Date: Tue, 30 Jan 2024 18:58:46 GMT
- Title: Effect of Weight Quantization on Learning Models by Typical Case
Analysis
- Authors: Shuhei Kashiwamura, Ayaka Sakata, Masaaki Imaizumi
- Abstract summary: The recent surge in data analysis scale has significantly increased computational resource requirements.
Quantization is vital for deploying large models on devices with limited computational resources.
- Score: 6.9060054915724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper examines the quantization methods used in large-scale data
analysis models and their hyperparameter choices. The recent surge in data
analysis scale has significantly increased computational resource requirements.
To address this, quantizing model weights has become a prevalent practice in
data analysis applications such as deep learning. Quantization is particularly
vital for deploying large models on devices with limited computational
resources. However, the selection of quantization hyperparameters, like the
number of bits and value range for weight quantization, remains an
underexplored area. In this study, we employ the typical case analysis from
statistical physics, specifically the replica method, to explore the impact of
hyperparameters on the quantization of simple learning models. Our analysis
yields three key findings: (i) an unstable hyperparameter phase, known as
replica symmetry breaking, occurs with a small number of bits and a large
quantization width; (ii) there is an optimal quantization width that minimizes
error; and (iii) quantization delays the onset of overparameterization, helping
to mitigate overfitting as indicated by the double descent phenomenon. We also
discover that non-uniform quantization can enhance stability. Additionally, we
develop an approximate message-passing algorithm to validate our theoretical
results.
Related papers
- Scaling Laws for Mixed quantization in Large Language Models [10.912306313183972]
Post-training quantization of Large Language Models (LLMs) has proven effective in reducing the computational requirements for running inference on these models.
In this study, we focus on a straightforward question: When aiming for a specific accuracy or perplexity target for low-precision quantization, how many high-precision numbers or calculations are required to preserve as we scale LLMs to larger sizes?
arXiv Detail & Related papers (2024-10-09T09:45:01Z) - How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training [1.721868124457512]
This paper investigates three different parameterizations of asymmetric uniform quantization for quantization-aware training.
We propose best practices to stabilize and accelerate quantization-aware training with learnable asymmetric quantization ranges.
arXiv Detail & Related papers (2024-04-25T06:58:16Z) - Data-freeWeight Compress and Denoise for Large Language Models [101.53420111286952]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.
We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z) - WKVQuant: Quantizing Weight and Key/Value Cache for Large Language
Models Gains More [55.0856305773081]
Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process.
This paper addresses these challenges by focusing on the quantization of LLMs, a technique that reduces memory consumption by converting model parameters and activations into low-bit integers.
arXiv Detail & Related papers (2024-02-19T11:33:21Z) - PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
Models [52.09865918265002]
We propose a novel quantize before fine-tuning'' framework, PreQuant.
PreQuant is compatible with various quantization strategies, with outlier-aware fine-tuning incorporated to correct the induced quantization error.
We demonstrate the effectiveness of PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5.
arXiv Detail & Related papers (2023-05-30T08:41:33Z) - ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
Comprehensive Study to Low Rank Compensation [24.34969722921442]
Post-training quantization (PTQ) has emerged as a promising technique for mitigating memory consumption and computational costs in large language models (LLMs)
We conduct a comprehensive analysis of these factors by investigating the effects of PTQ on weight-only, activation-only, and weight-and-activation quantization.
We propose an optimized method called Low-Rank Compensation (LoRC) to enhance model quality recovery with a minimal increase in model size.
arXiv Detail & Related papers (2023-03-15T01:27:15Z) - Ternary Quantization: A Survey [12.90416661059601]
Inference time, model size, and accuracy are critical for deploying deep neural network models.
We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods.
arXiv Detail & Related papers (2023-03-02T03:38:51Z) - End-to-end resource analysis for quantum interior point methods and portfolio optimization [63.4863637315163]
We provide a complete quantum circuit-level description of the algorithm from problem input to problem output.
We report the number of logical qubits and the quantity/depth of non-Clifford T-gates needed to run the algorithm.
arXiv Detail & Related papers (2022-11-22T18:54:48Z) - Mixed-Precision Inference Quantization: Radically Towards Faster
inference speed, Lower Storage requirement, and Lower Loss [4.877532217193618]
Existing quantization techniques rely heavily on experience and "fine-tuning" skills.
This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model.
In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method.
arXiv Detail & Related papers (2022-07-20T10:55:34Z) - Quantum Algorithms for Data Representation and Analysis [68.754953879193]
We provide quantum procedures that speed-up the solution of eigenproblems for data representation in machine learning.
The power and practical use of these subroutines is shown through new quantum algorithms, sublinear in the input matrix's size, for principal component analysis, correspondence analysis, and latent semantic analysis.
Results show that the run-time parameters that do not depend on the input's size are reasonable and that the error on the computed model is small, allowing for competitive classification performances.
arXiv Detail & Related papers (2021-04-19T00:41:43Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.