Related papers: How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?

How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?

URL: http://arxiv.org/abs/2305.05318v2
Date: Fri, 4 Aug 2023 06:11:24 GMT
Title: How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?
Authors: Jetze T. Schuurmans, Kim Batselier, Julian F. P. Kooij
Abstract summary: Recent work assumes the approximation error on the weights is a proxy for the performance of the model to compress multiple layers and fine-tune the compressed model. We perform an experimental study to test if this assumption holds across different layers and types of decompositions, and what the effect of fine-tuning is. We find the approximation error on the weights has a positive correlation with the performance error, before as well as after fine-tuning.
Score: 7.358732518242147
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tensor decompositions have been successfully applied to compress neural networks. The compression algorithms using tensor decompositions commonly minimize the approximation error on the weights. Recent work assumes the approximation error on the weights is a proxy for the performance of the model to compress multiple layers and fine-tune the compressed model. Surprisingly, little research has systematically evaluated which approximation errors can be used to make choices regarding the layer, tensor decomposition method, and level of compression. To close this gap, we perform an experimental study to test if this assumption holds across different layers and types of decompositions, and what the effect of fine-tuning is. We include the approximation error on the features resulting from a compressed layer in our analysis to test if this provides a better proxy, as it explicitly takes the data into account. We find the approximation error on the weights has a positive correlation with the performance error, before as well as after fine-tuning. Basing the approximation error on the features does not improve the correlation significantly. While scaling the approximation error commonly is used to account for the different sizes of layers, the average correlation across layers is smaller than across all choices (i.e. layers, decompositions, and level of compression) before fine-tuning. When calculating the correlation across the different decompositions, the average rank correlation is larger than across all choices. This means multiple decompositions can be considered for compression and the approximation error can be used to choose between them.

Related papers

Compression for Better: A General and Stable Lossless Compression Framework [7.356622397575378]
Key challenge is effectively leveraging compression errors to minimize model loss. We propose a general textbfLosstextbfLess textbfCompression theoretical framework (textbfLLC) We apply various compression techniques, including quantization and decomposition.
arXiv Detail & Related papers (2024-12-09T09:55:54Z)
Problem-dependent convergence bounds for randomized linear gradient compression [4.656302602746228]
In distributed optimization, the communication model updates can be a performance bottleneck. gradient compression has been proposed as a means of increasing optimization. We study how the impact of compression on throughput can be in terms of the norm of the Hessian objective.
arXiv Detail & Related papers (2024-11-19T22:26:42Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression. We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z)
Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Data-Independent Structured Pruning of Neural Networks via Coresets [21.436706159840018]
We propose the first efficient structured pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample. Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $xin mathbbRd$, including an adversarial one.
arXiv Detail & Related papers (2020-08-19T08:03:09Z)
A Unified Weight Learning and Low-Rank Regression Model for Robust Complex Error Modeling [12.287346997617542]
One of the most important problems in regression-based error model is modeling the complex representation error caused by various corruptions environment changes in images. In this paper, we propose a unified weight learning and low-rank approximation regression model, which enables the random noises contiguous occlusions in images to be treated simultaneously.
arXiv Detail & Related papers (2020-05-10T09:50:14Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.