Related papers: Optimal Quantization for Matrix Multiplication

Optimal Quantization for Matrix Multiplication

URL: http://arxiv.org/abs/2410.13780v2
Date: Fri, 17 Jan 2025 14:26:37 GMT
Title: Optimal Quantization for Matrix Multiplication
Authors: Or Ordentlich, Yury Polyanskiy,
Abstract summary: We build a universal quantizer based on nested lattices with an explicit guarantee of approximation error.<n>A practical low-complexity version of our quantizer achieves performance quite close to optimal.
Score: 35.007966885532724
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work in machine learning community proposed multiple methods for performing lossy compression (quantization) of large matrices. This quantization is important for accelerating matrix multiplication (main component of large language models), which is often bottlenecked by the speed of loading these matrices from memory. Unlike classical vector quantization and rate-distortion theory, the goal of these new compression algorithms is to be able to approximate not the matrices themselves, but their matrix product. Specifically, given a pair of real matrices $A,B$ an encoder (compressor) is applied to each of them independently producing descriptions with $R$ bits per entry. These representations subsequently are used by the decoder to estimate matrix product $A^\top B$. In this work, we provide a non-asymptotic lower bound on the mean squared error of this approximation (as a function of rate $R$) for the case of matrices $A,B$ with iid Gaussian entries. Algorithmically, we construct a universal quantizer based on nested lattices with an explicit guarantee of approximation error for any (non-random) pair of matrices $A$, $B$ in terms of only Frobenius norms $\|\bar{A}\|_F, \|\bar{B}\|_F$ and $\|\bar{A}^\top \bar{B}\|_F$, where $\bar{A},\bar{B}$ are versions of $A,B$ with zero-centered columns, respectively. For iid Gaussian matrices our quantizer achieves the lower bound and is, thus, asymptotically optimal. A practical low-complexity version of our quantizer achieves performance quite close to optimal. In addition, we derive rate-distortion function for matrix multiplication of iid Gaussian matrices, which exhibits an interesting phase-transition at $R\approx 0.906$ bit/entry.

Related papers

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions [23.539428616884035]
We study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $Kin mathbbRntimes n$.<n>Our algorithms rely on the following modelling assumption about the $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to the worst case growth.<n>We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted computation.
arXiv Detail & Related papers (2025-07-31T13:29:43Z)
Private Low-Rank Approximation for Covariance Matrices, Dyson Brownian Motion, and Eigenvalue-Gap Bounds for Gaussian Perturbations [29.212403229351253]
We analyze a complex variant of the Gaussian mechanism and obtain upper bounds on the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$. We show that the eigenvalues of the matrix $M$ perturbed by Gaussian noise have large gaps with high probability.
arXiv Detail & Related papers (2025-02-11T15:46:03Z)
Quantum Time-Space Tradeoffs for Matrix Problems [0.5524804393257919]
We consider the time and space required for quantum computers to solve a range of problems involving matrices. For almost all matrices $A$, we prove that quantum circuits with at most $T$ input queries and $S$ qubits of memory require $T=Omega(n2/S)$. Because many of our lower bounds match deterministic algorithms with the same time and space complexity, we show that quantum computers cannot provide any advantage for these problems with any space bound.
arXiv Detail & Related papers (2024-01-10T18:38:43Z)
Matrix Compression via Randomized Low Rank and Low Precision Factorization [47.902465710511485]
Modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix $mathbfA$ as $mathbfLmathbfR$. We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of LlaMa-$7$b.
arXiv Detail & Related papers (2023-10-17T06:56:57Z)
Multiresolution kernel matrix algebra [0.0]
We show the compression of kernel matrices by means of samplets produces optimally sparse matrices in a certain S-format. The inverse of a kernel matrix (if it exists) is compressible in the S-format as well. The matrix algebra is justified mathematically by pseudo differential calculus.
arXiv Detail & Related papers (2022-11-21T17:50:22Z)
Fast optimization of common basis for matrix set through Common Singular Value Decomposition [0.8702432681310399]
Proposed CSVD (common SVD): fast general approach based on SVD. $U$ as built of eigenvectors of $sum_i (w_k)q (A_k A_kT)p$ and $V$ of $sum_k (w_k)q (A_kT A_k)p$, where $w_k$ are their weights.
arXiv Detail & Related papers (2022-04-18T10:18:51Z)
Block-encoding dense and full-rank kernels using hierarchical matrices: applications in quantum numerical linear algebra [6.338178373376447]
We propose a block-encoding scheme of the hierarchical matrix structure on a quantum computer. Our method can improve the runtime of solving quantum linear systems of dimension $N$ to $O(kappa operatornamepolylog(fracNvarepsilon))$.
arXiv Detail & Related papers (2022-01-27T05:24:02Z)
Robust 1-bit Compressive Sensing with Partial Gaussian Circulant Matrices and Generative Priors [54.936314353063494]
We provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our results.
arXiv Detail & Related papers (2021-08-08T05:28:06Z)
Non-PSD Matrix Sketching with Applications to Regression and Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices. We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z)
Quantum algorithms for spectral sums [50.045011844765185]
We propose new quantum algorithms for estimating spectral sums of positive semi-definite (PSD) matrices. We show how the algorithms and techniques used in this work can be applied to three problems in spectral graph theory.
arXiv Detail & Related papers (2020-11-12T16:29:45Z)
Hutch++: Optimal Stochastic Trace Estimation [75.45968495410048]
We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$. We show that it significantly outperforms Hutchinson's method in experiments.
arXiv Detail & Related papers (2020-10-19T16:45:37Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
Sketching Transformed Matrices with Applications to Natural Language Processing [76.6222695417524]
We propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. We show that our approach obtains small error and is efficient in both space and time.
arXiv Detail & Related papers (2020-02-23T03:07:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.