Optimal Quantization for Matrix Multiplication
- URL: http://arxiv.org/abs/2410.13780v2
- Date: Fri, 17 Jan 2025 14:26:37 GMT
- Title: Optimal Quantization for Matrix Multiplication
- Authors: Or Ordentlich, Yury Polyanskiy,
- Abstract summary: We build a universal quantizer based on nested lattices with an explicit guarantee of approximation error.
A practical low-complexity version of our quantizer achieves performance quite close to optimal.
- Score: 35.007966885532724
- License:
- Abstract: Recent work in machine learning community proposed multiple methods for performing lossy compression (quantization) of large matrices. This quantization is important for accelerating matrix multiplication (main component of large language models), which is often bottlenecked by the speed of loading these matrices from memory. Unlike classical vector quantization and rate-distortion theory, the goal of these new compression algorithms is to be able to approximate not the matrices themselves, but their matrix product. Specifically, given a pair of real matrices $A,B$ an encoder (compressor) is applied to each of them independently producing descriptions with $R$ bits per entry. These representations subsequently are used by the decoder to estimate matrix product $A^\top B$. In this work, we provide a non-asymptotic lower bound on the mean squared error of this approximation (as a function of rate $R$) for the case of matrices $A,B$ with iid Gaussian entries. Algorithmically, we construct a universal quantizer based on nested lattices with an explicit guarantee of approximation error for any (non-random) pair of matrices $A$, $B$ in terms of only Frobenius norms $\|\bar{A}\|_F, \|\bar{B}\|_F$ and $\|\bar{A}^\top \bar{B}\|_F$, where $\bar{A},\bar{B}$ are versions of $A,B$ with zero-centered columns, respectively. For iid Gaussian matrices our quantizer achieves the lower bound and is, thus, asymptotically optimal. A practical low-complexity version of our quantizer achieves performance quite close to optimal. In addition, we derive rate-distortion function for matrix multiplication of iid Gaussian matrices, which exhibits an interesting phase-transition at $R\approx 0.906$ bit/entry.
Related papers
- Private Low-Rank Approximation for Covariance Matrices, Dyson Brownian Motion, and Eigenvalue-Gap Bounds for Gaussian Perturbations [29.212403229351253]
We analyze a complex variant of the Gaussian mechanism and obtain upper bounds on the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$.
We show that the eigenvalues of the matrix $M$ perturbed by Gaussian noise have large gaps with high probability.
arXiv Detail & Related papers (2025-02-11T15:46:03Z) - Quantum Time-Space Tradeoffs for Matrix Problems [0.5524804393257919]
We consider the time and space required for quantum computers to solve a range of problems involving matrices.
For almost all matrices $A$, we prove that quantum circuits with at most $T$ input queries and $S$ qubits of memory require $T=Omega(n2/S)$.
Because many of our lower bounds match deterministic algorithms with the same time and space complexity, we show that quantum computers cannot provide any advantage for these problems with any space bound.
arXiv Detail & Related papers (2024-01-10T18:38:43Z) - Matrix Compression via Randomized Low Rank and Low Precision
Factorization [47.902465710511485]
Modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage.
We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix $mathbfA$ as $mathbfLmathbfR$.
We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of LlaMa-$7$b.
arXiv Detail & Related papers (2023-10-17T06:56:57Z) - Multiresolution kernel matrix algebra [0.0]
We show the compression of kernel matrices by means of samplets produces optimally sparse matrices in a certain S-format.
The inverse of a kernel matrix (if it exists) is compressible in the S-format as well.
The matrix algebra is justified mathematically by pseudo differential calculus.
arXiv Detail & Related papers (2022-11-21T17:50:22Z) - Block-encoding dense and full-rank kernels using hierarchical matrices:
applications in quantum numerical linear algebra [6.338178373376447]
We propose a block-encoding scheme of the hierarchical matrix structure on a quantum computer.
Our method can improve the runtime of solving quantum linear systems of dimension $N$ to $O(kappa operatornamepolylog(fracNvarepsilon))$.
arXiv Detail & Related papers (2022-01-27T05:24:02Z) - Robust 1-bit Compressive Sensing with Partial Gaussian Circulant
Matrices and Generative Priors [54.936314353063494]
We provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing.
We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our results.
arXiv Detail & Related papers (2021-08-08T05:28:06Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Quantum algorithms for spectral sums [50.045011844765185]
We propose new quantum algorithms for estimating spectral sums of positive semi-definite (PSD) matrices.
We show how the algorithms and techniques used in this work can be applied to three problems in spectral graph theory.
arXiv Detail & Related papers (2020-11-12T16:29:45Z) - Hutch++: Optimal Stochastic Trace Estimation [75.45968495410048]
We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$.
We show that it significantly outperforms Hutchinson's method in experiments.
arXiv Detail & Related papers (2020-10-19T16:45:37Z) - Sketching Transformed Matrices with Applications to Natural Language
Processing [76.6222695417524]
We propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix.
We show that our approach obtains small error and is efficient in both space and time.
arXiv Detail & Related papers (2020-02-23T03:07:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.