Solving Oscillation Problem in Post-Training Quantization Through a
Theoretical Perspective
- URL: http://arxiv.org/abs/2303.11906v2
- Date: Tue, 4 Apr 2023 08:04:19 GMT
- Title: Solving Oscillation Problem in Post-Training Quantization Through a
Theoretical Perspective
- Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei
Wen, Xin Pan, Fei Chao, Rongrong Ji
- Abstract summary: Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically.
We argue that an overlooked problem of oscillation is in the PTQ methods.
- Score: 74.48124653728422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Post-training quantization (PTQ) is widely regarded as one of the most
efficient compression methods practically, benefitting from its data privacy
and low computation costs. We argue that an overlooked problem of oscillation
is in the PTQ methods. In this paper, we take the initiative to explore and
present a theoretical proof to explain why such a problem is essential in PTQ.
And then, we try to solve this problem by introducing a principled and
generalized framework theoretically. In particular, we first formulate the
oscillation in PTQ and prove the problem is caused by the difference in module
capacity. To this end, we define the module capacity (ModCap) under
data-dependent and data-free scenarios, where the differentials between
adjacent modules are used to measure the degree of oscillation. The problem is
then solved by selecting top-k differentials, in which the corresponding
modules are jointly optimized and quantized. Extensive experiments demonstrate
that our method successfully reduces the performance drop and is generalized to
different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50
quantization, our method surpasses the previous state-of-the-art method by
1.9%. It becomes more significant on small model quantization, e.g. surpasses
BRECQ method by 6.61% on MobileNetV2*0.5.
Related papers
- Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization.
This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z) - Efficient variational quantum eigensolver methodologies on quantum processors [4.192048933715544]
We implement adaptive, tetris-adaptive variational quantum eigensolver (VQE) and entanglement forging to reduce computational resource requirements.
Our results affirm the usefulness of VQE on noisy quantum hardware and pave the way for the usage of VQE related methods for large molecules.
arXiv Detail & Related papers (2024-07-23T00:38:34Z) - Efficient molecular conformation generation with quantum-inspired algorithm [4.625636280559916]
We propose the use of quantum-inspired algorithm to solve the molecular unfolding (MU) problem.
The root-mean-square deviation between the conformation determined by our approach and density functional theory (DFT) is negligible.
Results indicate that quantum-inspired algorithms can be applied to solve practical problems even before quantum hardware become mature.
arXiv Detail & Related papers (2024-04-22T11:40:08Z) - Improving Parameter Training for VQEs by Sequential Hamiltonian Assembly [4.646930308096446]
A central challenge in quantum machine learning is the design and training of parameterized quantum circuits (PQCs)
We propose a Sequential Hamiltonian Assembly, which iteratively approximates the loss function using local components.
Our approach outperforms conventional parameter training by 29.99% and the empirical state of the art, Layerwise Learning, by 5.12% in the mean accuracy.
arXiv Detail & Related papers (2023-12-09T11:47:32Z) - Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the
Quantum Many-Body Schr\"odinger Equation [56.9919517199927]
"Wasserstein Quantum Monte Carlo" (WQMC) uses the gradient flow induced by the Wasserstein metric, rather than Fisher-Rao metric, and corresponds to transporting the probability mass, rather than teleporting it.
We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.
arXiv Detail & Related papers (2023-07-06T17:54:08Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance [53.45700148820669]
Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored.
This paper first investigates this problem on various commonly-used PTQ methods.
arXiv Detail & Related papers (2023-03-23T02:55:50Z) - Towards Efficient Post-training Quantization of Pre-trained Language
Models [85.68317334241287]
We study post-training quantization(PTQ) of PLMs, and propose module-wise quantization error minimization(MREM), an efficient solution to mitigate these issues.
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
arXiv Detail & Related papers (2021-09-30T12:50:06Z) - Efficient experimental characterization of quantum processes via
compressed sensing on an NMR quantum processor [4.291616110077346]
We employ the compressed sensing (CS) algorithm and a heavily reduced data set to experimentally perform true quantum process tomography (QPT) on an NMR quantum processor.
We obtain the estimate of the process matrix $chi$ corresponding to various two- and three-qubit quantum gates with a high fidelity.
We also experimentally characterized the reduced dynamics of a two-qubit subsystem embedded in a three-qubit system.
arXiv Detail & Related papers (2021-09-27T17:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.