Related papers: Learning Low-Rank Representations for Model Compression

Learning Low-Rank Representations for Model Compression

URL: http://arxiv.org/abs/2211.11397v1
Date: Mon, 21 Nov 2022 12:15:28 GMT
Title: Learning Low-Rank Representations for Model Compression
Authors: Zezhou Zhu, Yucong Zhou, Zhao Zhong
Abstract summary: We propose a Low-Rank Representation Vector Quantization ($textLR2textVQ$) method that outperforms previous VQ algorithms in various tasks and architectures. In our method, the compression ratio could be directly controlled by $m$, and the final accuracy is solely determined by $tilded$. With a proper $tilded$, we evaluate $textLR2textVQ$ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top
Score: 6.721845345130468
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations of the vectors in favour of clustering performance are not carefully considered, especially via the reduction of vector dimensionality. This paper reports our recent progress on the combination of dimensionality compression and vector quantization, proposing a Low-Rank Representation Vector Quantization ($\text{LR}^2\text{VQ}$) method that outperforms previous VQ algorithms in various tasks and architectures. $\text{LR}^2\text{VQ}$ joins low-rank representation with subvector clustering to construct a new kind of building block that is directly optimized through end-to-end training over the task loss. Our proposed design pattern introduces three hyper-parameters, the number of clusters $k$, the size of subvectors $m$ and the clustering dimensionality $\tilde{d}$. In our method, the compression ratio could be directly controlled by $m$, and the final accuracy is solely determined by $\tilde{d}$. We recognize $\tilde{d}$ as a trade-off between low-rank approximation error and clustering error and carry out both theoretical analysis and experimental observations that empower the estimation of the proper $\tilde{d}$ before fine-tunning. With a proper $\tilde{d}$, we evaluate $\text{LR}^2\text{VQ}$ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8\%/1.0\% top-1 accuracy improvements over the current state-of-the-art VQ-based compression algorithms with 43$\times$/31$\times$ compression factor.

Related papers

$γ$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning [15.458263187587097]
Gradient compression can effectively alleviate bottlenecks in Federated Learning (FL)<n>We introduce the fundamental conversation of Error-Feedback.<n>We show that $gamma$-FedHT improves accuracy by up to $7.42%$ over Top-$k$ under equal communication.
arXiv Detail & Related papers (2025-05-18T15:55:50Z)
Pruning is Optimal for Learning Sparse Features in High-Dimensions [15.967123173054535]
We show that a class of statistical models can be optimally learned using pruned neural networks trained with gradient descent. We show that pruning neural networks proportional to the sparsity level of $boldsymbolV$ improves their sample complexity compared to unpruned networks.
arXiv Detail & Related papers (2024-06-12T21:43:12Z)
Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization [48.84672493756553]
We propose a query-efficient and accurate estimator for gradients that uses only $Obig(slogfrac dsbig)$ queries per step.<n>Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions.
arXiv Detail & Related papers (2024-05-27T03:52:53Z)
ReALLM: A general framework for LLM compression and fine-tuning [11.738510106847414]
ReALLM is a novel approach for compression and memory-efficient adaptation of pre-trained language models. Weight-only quantization algorithm yields the best results on language generation tasks (C4 and WikiText-2) for a budget of $3$ bits without any training.
arXiv Detail & Related papers (2024-05-21T18:50:51Z)
Uncertainty quantification for iterative algorithms in linear models with application to early stopping [4.150180443030652]
This paper investigates the iterates $hbb1,dots,hbbT$ obtained from iterative algorithms in high-dimensional linear regression problems. The analysis and proposed estimators are applicable to Gradient Descent (GD), GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA)
arXiv Detail & Related papers (2024-04-27T10:20:41Z)
Deep Equilibrium Object Detection [24.69829309391189]
We present a new query-based object detector (DEQDet) by designing a deep equilibrium decoder. Our experiments demonstrate DEQDet converges faster, consumes less memory, and achieves better results than the baseline counterpart.
arXiv Detail & Related papers (2023-08-18T13:56:03Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Rotation Invariant Quantization for Model Compression [7.269081881533542]
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. We suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model.
arXiv Detail & Related papers (2023-03-03T10:53:30Z)
Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
Deep Learning Meets Projective Clustering [66.726500395069]
A common approach for compressing NLP networks is to encode the embedding layer as a matrix $AinmathbbRntimes d$. Inspired by emphprojective clustering from computational geometry, we suggest replacing this subspace by a set of $k$ subspaces.
arXiv Detail & Related papers (2020-10-08T22:47:48Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.