How Informative is the Approximation Error from Tensor Decomposition for
Neural Network Compression?
- URL: http://arxiv.org/abs/2305.05318v2
- Date: Fri, 4 Aug 2023 06:11:24 GMT
- Title: How Informative is the Approximation Error from Tensor Decomposition for
Neural Network Compression?
- Authors: Jetze T. Schuurmans, Kim Batselier, Julian F. P. Kooij
- Abstract summary: Recent work assumes the approximation error on the weights is a proxy for the performance of the model to compress multiple layers and fine-tune the compressed model.
We perform an experimental study to test if this assumption holds across different layers and types of decompositions, and what the effect of fine-tuning is.
We find the approximation error on the weights has a positive correlation with the performance error, before as well as after fine-tuning.
- Score: 7.358732518242147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tensor decompositions have been successfully applied to compress neural
networks. The compression algorithms using tensor decompositions commonly
minimize the approximation error on the weights. Recent work assumes the
approximation error on the weights is a proxy for the performance of the model
to compress multiple layers and fine-tune the compressed model. Surprisingly,
little research has systematically evaluated which approximation errors can be
used to make choices regarding the layer, tensor decomposition method, and
level of compression. To close this gap, we perform an experimental study to
test if this assumption holds across different layers and types of
decompositions, and what the effect of fine-tuning is. We include the
approximation error on the features resulting from a compressed layer in our
analysis to test if this provides a better proxy, as it explicitly takes the
data into account. We find the approximation error on the weights has a
positive correlation with the performance error, before as well as after
fine-tuning. Basing the approximation error on the features does not improve
the correlation significantly. While scaling the approximation error commonly
is used to account for the different sizes of layers, the average correlation
across layers is smaller than across all choices (i.e. layers, decompositions,
and level of compression) before fine-tuning. When calculating the correlation
across the different decompositions, the average rank correlation is larger
than across all choices. This means multiple decompositions can be considered
for compression and the approximation error can be used to choose between them.
Related papers
- Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks.
It automatically analyzes each layer to identify the optimal per-layer compression ratio.
Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Optimal Gradient Compression for Distributed and Federated Learning [9.711326718689492]
Communication between computing nodes in distributed learning is typically an unavoidable burden.
Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques.
In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error.
arXiv Detail & Related papers (2020-10-07T07:58:59Z) - Data-Independent Structured Pruning of Neural Networks via Coresets [21.436706159840018]
We propose the first efficient structured pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample.
Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $xin mathbbRd$, including an adversarial one.
arXiv Detail & Related papers (2020-08-19T08:03:09Z) - A Unified Weight Learning and Low-Rank Regression Model for Robust
Complex Error Modeling [12.287346997617542]
One of the most important problems in regression-based error model is modeling the complex representation error caused by various corruptions environment changes in images.
In this paper, we propose a unified weight learning and low-rank approximation regression model, which enables the random noises contiguous occlusions in images to be treated simultaneously.
arXiv Detail & Related papers (2020-05-10T09:50:14Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.