Related papers: Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks

Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks

URL: http://arxiv.org/abs/2511.04494v2
Date: Thu, 13 Nov 2025 01:39:20 GMT
Title: Distribution-Aware Tensor Decomposition for Compression of Convolutional Neural Networks
Authors: Alper Kalle, Theo Rudkiewicz, Mohamed-Oumar Ouerfelli, Mohamed Tamaazousti,
Abstract summary: We focus on compression through tensorization and low-rank representations.<n>We use data-informed norms that measure the error in function space.<n>Unlike conventional compression pipelines, our data-informed approach often achieves competitive accuracy without any fine-tuning.
Score: 4.322339935902436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural networks are widely used for image-related tasks but typically demand considerable computing power. Once a network has been trained, however, its memory- and compute-footprint can be reduced by compression. In this work, we focus on compression through tensorization and low-rank representations. Whereas classical approaches search for a low-rank approximation by minimizing an isotropic norm such as the Frobenius norm in weight-space, we use data-informed norms that measure the error in function space. Concretely, we minimize the change in the layer's output distribution, which can be expressed as $\lVert (W - \widetilde{W}) Σ^{1/2}\rVert_F$ where $Σ^{1/2}$ is the square root of the covariance matrix of the layer's input and $W$, $\widetilde{W}$ are the original and compressed weights. We propose new alternating least square algorithms for the two most common tensor decompositions (Tucker-2 and CPD) that directly optimize the new norm. Unlike conventional compression pipelines, which almost always require post-compression fine-tuning, our data-informed approach often achieves competitive accuracy without any fine-tuning. We further show that the same covariance-based norm can be transferred from one dataset to another with only a minor accuracy drop, enabling compression even when the original training dataset is unavailable. Experiments on several CNN architectures (ResNet-18/50, and GoogLeNet) and datasets (ImageNet, FGVC-Aircraft, Cifar10, and Cifar100) confirm the advantages of the proposed method.

Related papers

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions [10.635248457021499]
Pruning is one of the predominant approaches for compressing deep neural networks (DNNs) We propose a novel and robust framework for computing such coresets under mild assumptions on the model's weights and inputs. Our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets.
arXiv Detail & Related papers (2022-09-18T12:45:26Z)
A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression. We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z)
Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition. A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression. For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z)
An Information Theory-inspired Strategy for Automatic Network Pruning [97.03772272417599]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.<n>Most existing network pruning methods require laborious human efforts and prohibitive computation resources.<n>We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together. In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z)
Data-Independent Structured Pruning of Neural Networks via Coresets [21.436706159840018]
We propose the first efficient structured pruning algorithm with a provable trade-off between its compression rate and the approximation error for any future test sample. Unlike previous works, our coreset is data independent, meaning that it provably guarantees the accuracy of the function for any input $xin mathbbRd$, including an adversarial one.
arXiv Detail & Related papers (2020-08-19T08:03:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.