Learning Low-Rank Representations for Model Compression
- URL: http://arxiv.org/abs/2211.11397v1
- Date: Mon, 21 Nov 2022 12:15:28 GMT
- Title: Learning Low-Rank Representations for Model Compression
- Authors: Zezhou Zhu, Yucong Zhou, Zhao Zhong
- Abstract summary: We propose a Low-Rank Representation Vector Quantization ($textLR2textVQ$) method that outperforms previous VQ algorithms in various tasks and architectures.
In our method, the compression ratio could be directly controlled by $m$, and the final accuracy is solely determined by $tilded$.
With a proper $tilded$, we evaluate $textLR2textVQ$ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top
- Score: 6.721845345130468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector Quantization (VQ) is an appealing model compression method to obtain a
tiny model with less accuracy loss. While methods to obtain better codebooks
and codes under fixed clustering dimensionality have been extensively studied,
optimizations of the vectors in favour of clustering performance are not
carefully considered, especially via the reduction of vector dimensionality.
This paper reports our recent progress on the combination of dimensionality
compression and vector quantization, proposing a Low-Rank Representation Vector
Quantization ($\text{LR}^2\text{VQ}$) method that outperforms previous VQ
algorithms in various tasks and architectures. $\text{LR}^2\text{VQ}$ joins
low-rank representation with subvector clustering to construct a new kind of
building block that is directly optimized through end-to-end training over the
task loss. Our proposed design pattern introduces three hyper-parameters, the
number of clusters $k$, the size of subvectors $m$ and the clustering
dimensionality $\tilde{d}$. In our method, the compression ratio could be
directly controlled by $m$, and the final accuracy is solely determined by
$\tilde{d}$. We recognize $\tilde{d}$ as a trade-off between low-rank
approximation error and clustering error and carry out both theoretical
analysis and experimental observations that empower the estimation of the
proper $\tilde{d}$ before fine-tunning. With a proper $\tilde{d}$, we evaluate
$\text{LR}^2\text{VQ}$ with ResNet-18/ResNet-50 on ImageNet classification
datasets, achieving 2.8\%/1.0\% top-1 accuracy improvements over the current
state-of-the-art VQ-based compression algorithms with 43$\times$/31$\times$
compression factor.
Related papers
- Pruning is Optimal for Learning Sparse Features in High-Dimensions [15.967123173054535]
We show that a class of statistical models can be optimally learned using pruned neural networks trained with gradient descent.
We show that pruning neural networks proportional to the sparsity level of $boldsymbolV$ improves their sample complexity compared to unpruned networks.
arXiv Detail & Related papers (2024-06-12T21:43:12Z) - ReALLM: A general framework for LLM compression and fine-tuning [11.738510106847414]
ReALLM is a novel approach for compression and memory-efficient adaptation of pre-trained language models.
Weight-only quantization algorithm yields the best results on language generation tasks (C4 and WikiText-2) for a budget of $3$ bits without any training.
arXiv Detail & Related papers (2024-05-21T18:50:51Z) - Uncertainty quantification for iterative algorithms in linear models with application to early stopping [4.150180443030652]
This paper investigates the iterates $hbb1,dots,hbbT$ obtained from iterative algorithms in high-dimensional linear regression problems.
The analysis and proposed estimators are applicable to Gradient Descent (GD), GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA)
arXiv Detail & Related papers (2024-04-27T10:20:41Z) - Deep Equilibrium Object Detection [24.69829309391189]
We present a new query-based object detector (DEQDet) by designing a deep equilibrium decoder.
Our experiments demonstrate DEQDet converges faster, consumes less memory, and achieves better results than the baseline counterpart.
arXiv Detail & Related papers (2023-08-18T13:56:03Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Deep Learning Meets Projective Clustering [66.726500395069]
A common approach for compressing NLP networks is to encode the embedding layer as a matrix $AinmathbbRntimes d$.
Inspired by emphprojective clustering from computational geometry, we suggest replacing this subspace by a set of $k$ subspaces.
arXiv Detail & Related papers (2020-10-08T22:47:48Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.