Related papers: Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding

URL: http://arxiv.org/abs/2505.18758v1
Date: Sat, 24 May 2025 15:52:49 GMT
Title: Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding
Authors: Alexander Conzelmann, Robert Bamler,
Abstract summary: The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
Score: 56.066799081747845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ever-growing size of neural networks poses serious challenges on resource-constrained devices, such as embedded sensors. Compression algorithms that reduce their size can mitigate these problems, provided that model performance stays close to the original. We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding by (1) extending the well-known layer-wise loss by a quadratic rate estimation, and (2) providing locally exact solutions to this modified objective following the Optimal Brain Surgeon (OBS) method. Our method allows for very fast decoding and is compatible with arbitrary quantization grids. We verify our results empirically by testing on various computer-vision networks, achieving a 20-40\% decrease in bit rate at the same performance as the popular compression algorithm NNCodec. Our code is available at https://github.com/Conzel/cerwu.

Related papers

LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression [22.44693836632384]
Implicit Neural Representation (INR) methods solve the problem by encoding overfitted network parameters to the bitstream.<n>Due to the limitation of encoding time and decoder size, current INR based methods only consider lossy geometry compression.<n>We propose Lossless Implicit Neural Representations for Point Cloud Geometry Compression (LINR-PCGC)
arXiv Detail & Related papers (2025-07-21T14:48:54Z)
GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization [0.0]
GANCompress is a novel neural compression framework that combines Binary Spherical Quantization (BSQ) with Generative Adversarial Networks (GANs)<n>We show that GANCompress achieves substantial improvement in compression efficiency -- reducing file sizes by up to 100x with minimal visual distortion.
arXiv Detail & Related papers (2025-05-19T00:18:27Z)
A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z)
Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations [3.6188659868203388]
We propose a compression-aware projection system to improve the trade-off between classification accuracy and compression ratio. Our test results show that the proposed methods effectively reduce 2.91x5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
arXiv Detail & Related papers (2021-10-17T14:02:02Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition [62.41259783906452]
We present a novel global compression framework for deep neural networks. It automatically analyzes each layer to identify the optimal per-layer compression ratio. Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
arXiv Detail & Related papers (2021-07-23T20:01:30Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
arXiv Detail & Related papers (2021-05-15T00:10:12Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.