Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
- URL: http://arxiv.org/abs/2310.13315v1
- Date: Fri, 20 Oct 2023 07:09:56 GMT
- Title: Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
- Authors: Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du,
Dacheng Tao
- Abstract summary: Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
- Score: 88.80146574509195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization is a promising approach for reducing memory overhead and
accelerating inference, especially in large pre-trained language model (PLM)
scenarios. While having no access to original training data due to security and
privacy concerns has emerged the demand for zero-shot quantization. Most of the
cutting-edge zero-shot quantization methods primarily 1) apply to computer
vision tasks, and 2) neglect of overfitting problem in the generative
adversarial learning process, leading to sub-optimal performance. Motivated by
this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ)
framework for the zero-shot quantization of various PLMs. The key algorithm in
solving ZSAQ is the SAM-SGA optimization, which aims to improve the
quantization accuracy and model generalization via optimizing a minimax
problem. We theoretically prove the convergence rate for the minimax
optimization problem and this result can be applied to other nonconvex-PL
minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate
that our method brings consistent and significant performance gains on both
discriminative and generative PLMs, i.e., up to +6.98 average score.
Furthermore, we empirically validate that our method can effectively improve
the model generalization.
Related papers
- RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [95.32315448601241]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)
RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.
Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z) - Gradient Multi-Normalization for Stateless and Scalable LLM Training [16.037614012166063]
Training large language models (LLMs) typically relies on adaptives like Adam which store additional state information to accelerate convergence but incur significant memory overhead.
Recent efforts, such as SWAN (Ma et al., 2024) address this by eliminating the need for states while achieving performance comparable to Adam via a multi-step preprocessing procedure applied to instantaneous gradients.
We introduce a novel framework for designing stateless gradients that normalizes gradients according to multiple norms. Experiments on pre-training LLaMA models with up to 1 billion parameters demonstrate a 3X speedup over Adam with significantly reduced memory requirements, outperforming other memory-efficient baseline
arXiv Detail & Related papers (2025-02-10T18:09:53Z) - ZOQO: Zero-Order Quantized Optimization [31.43307762723943]
We introduce a zero-order quantized optimization (ZOQO) method designed for training models with quantized parameters and operations.
Our approach leverages zero-order approximations of the gradient sign and adapts the learning process to maintain the parameters' quantization without the need for full-precision gradient calculations.
arXiv Detail & Related papers (2025-01-12T07:15:55Z) - AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning [22.950914612765494]
Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks.
Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph.
We propose the Adaptive Zeroth-order-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods.
arXiv Detail & Related papers (2024-06-26T04:33:13Z) - A Universal Class of Sharpness-Aware Minimization Algorithms [57.29207151446387]
We introduce a new class of sharpness measures, leading to new sharpness-aware objective functions.
We prove that these measures are textitly expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyper and determinants.
arXiv Detail & Related papers (2024-06-06T01:52:09Z) - QuantEase: Optimization-based Quantization for Language Models [17.333778751252392]
This work introduces Quantization (PTQ) of various quantization layers from recent advances of Large Language Models (LLMs)
Our CD-based approach features straightforward updates, relying solely on vector operations.
We also explore an outlier approach, allowing for retaining significant weights (outoutliers) with complete precision.
arXiv Detail & Related papers (2023-09-05T01:39:09Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness.
Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss.
We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.