Matrix Compression via Randomized Low Rank and Low Precision
Factorization
- URL: http://arxiv.org/abs/2310.11028v1
- Date: Tue, 17 Oct 2023 06:56:57 GMT
- Title: Matrix Compression via Randomized Low Rank and Low Precision
Factorization
- Authors: Rajarshi Saha, Varun Srivastava, Mert Pilanci
- Abstract summary: Modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage.
We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix $mathbfA$ as $mathbfLmathbfR$.
We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of LlaMa-$7$b.
- Score: 47.902465710511485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Matrices are exceptionally useful in various fields of study as they provide
a convenient framework to organize and manipulate data in a structured manner.
However, modern matrices can involve billions of elements, making their storage
and processing quite demanding in terms of computational resources and memory
usage. Although prohibitively large, such matrices are often approximately low
rank. We propose an algorithm that exploits this structure to obtain a low rank
decomposition of any matrix $\mathbf{A}$ as $\mathbf{A} \approx
\mathbf{L}\mathbf{R}$, where $\mathbf{L}$ and $\mathbf{R}$ are the low rank
factors. The total number of elements in $\mathbf{L}$ and $\mathbf{R}$ can be
significantly less than that in $\mathbf{A}$. Furthermore, the entries of
$\mathbf{L}$ and $\mathbf{R}$ are quantized to low precision formats $--$
compressing $\mathbf{A}$ by giving us a low rank and low precision
factorization. Our algorithm first computes an approximate basis of the range
space of $\mathbf{A}$ by randomly sketching its columns, followed by a
quantization of the vectors constituting this basis. It then computes
approximate projections of the columns of $\mathbf{A}$ onto this quantized
basis. We derive upper bounds on the approximation error of our algorithm, and
analyze the impact of target rank and quantization bit-budget. The tradeoff
between compression ratio and approximation accuracy allows for flexibility in
choosing these parameters based on specific application requirements. We
empirically demonstrate the efficacy of our algorithm in image compression,
nearest neighbor classification of image and text embeddings, and compressing
the layers of LlaMa-$7$b. Our results illustrate that we can achieve
compression ratios as aggressive as one bit per matrix coordinate, all while
surpassing or maintaining the performance of traditional compression
techniques.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Compressing Large Language Models using Low Rank and Low Precision Decomposition [46.30918750022739]
This work introduces $rm CALDERA$ -- a new post-training LLM compression algorithm.
It harnesses the inherent low-rank structure of a weight matrix $mathbfW$ by approximating it via a low-rank, low-precision decomposition.
Results show that compressing LlaMa-$2$ $7$B/$13B$/$70$B and LlaMa-$3$ $8$B models using $rm CALDERA$ outperforms existing post-training compression techniques.
arXiv Detail & Related papers (2024-05-29T08:42:30Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Compressive Recovery of Sparse Precision Matrices [5.557600489035657]
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X in mathbbRn times d$.
We show that it is possible to estimate it from a sketch of size $m=Omegaleft((d+2k)log(d)right)$ where $k$ is the maximal number of edges of the underlying graph.
We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser.
arXiv Detail & Related papers (2023-11-08T13:29:08Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Fast Matrix Square Roots with Applications to Gaussian Processes and
Bayesian Optimization [24.085358075449406]
Matrix square roots and their inverses arise frequently in machine learning.
We introduce a highly-efficient quadratic-time algorithm for computing $mathbf K1/2 mathbf b$, $mathbf K-1/2 mathbf b$, and their derivatives through matrix-vector multiplication (MVMs)
Our method combines Krylov subspace methods with a rational approximation and typically $4$ decimal places of accuracy with fewer than $100$ MVMs.
arXiv Detail & Related papers (2020-06-19T17:56:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.