Related papers: An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling

An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling

URL: http://arxiv.org/abs/2506.12000v1
Date: Fri, 13 Jun 2025 17:54:42 GMT
Title: An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling
Authors: Yuriy Kim, Evgeny Belyaev,
Abstract summary: We propose a prediction-based compression approach, where values from the previously saved checkpoint are used for context modeling in arithmetic coding.<n> Experimental results show that our approach achieves substantial bit size reduction, while enabling near-lossless training recovery from restored checkpoints.
Score: 1.7495213911983414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper is dedicated to an efficient compression of weights and optimizer states (called checkpoints) obtained at different stages during a neural network training process. First, we propose a prediction-based compression approach, where values from the previously saved checkpoint are used for context modeling in arithmetic coding. Second, in order to enhance the compression performance, we also propose to apply pruning and quantization of the checkpoint values. Experimental results show that our approach achieves substantial bit size reduction, while enabling near-lossless training recovery from restored checkpoints, preserving the model's performance and making it suitable for storage-limited environments.

Related papers

On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization [5.952537659103525]
We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection.<n>We prove convergence of iterative singular value thresholding for training neural networks subject to a soft rank constraint.
arXiv Detail & Related papers (2025-07-12T23:39:14Z)
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP)<n>ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run.<n>We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z)
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets. We train a small parametric corrector network that adjusts stale cached target embeddings. Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z)
Shapley Pruning for Neural Network Compression [63.60286036508473]
This work presents the Shapley value approximations, and performs the comparative analysis in terms of cost-benefit utility for the neural network compression. The proposed normative ranking and its approximations show practical results, obtaining state-of-the-art network compression.
arXiv Detail & Related papers (2024-07-19T11:42:54Z)
Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization [5.648270790530862]
State-of-the-art approaches involve lossy model compression mechanisms, which induce a tradeoff between the resulting model quality (accuracy) and compression ratio. We make a key enabling observation that the sensitivity of model weights to compression varies during training, and different weights benefit from different quantization levels. We propose a non-uniform quantization scheme that leverages this variation, an efficient search mechanism that dynamically finds the best quantization configurations, and a quantization-aware delta compression mechanism that rearranges weights to minimize checkpoint differences.
arXiv Detail & Related papers (2023-06-20T18:00:31Z)
Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning. Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z)
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning [29.284147465251685]
We introduce a new compression framework which covers both weight pruning and quantization in a unified setting. We show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods.
arXiv Detail & Related papers (2022-08-24T14:33:35Z)
Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models. Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z)
Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations. We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure. Our method results in a state of the art network compression while being capable of achieving better inference accuracy. In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.