Related papers: Smooth Model Compression without Fine-Tuning

Smooth Model Compression without Fine-Tuning

URL: http://arxiv.org/abs/2505.24469v1
Date: Fri, 30 May 2025 11:13:48 GMT
Title: Smooth Model Compression without Fine-Tuning
Authors: Christina Runkel, Natacha Kuete Meli, Jovita Lukasik, Ander Biguri, Carola-Bibiane Schönlieb, Michael Moeller,
Abstract summary: We explore the impact of smooth regularization on neural network training and model compression.<n>We find that standard pruning methods often perform better when applied to smooth models.<n>Our approach enables state-of-the-art compression without any fine-tuning.
Score: 14.381101636079872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compressing and pruning large machine learning models has become a critical step towards their deployment in real-world applications. Standard pruning and compression techniques are typically designed without taking the structure of the network's weights into account, limiting their effectiveness. We explore the impact of smooth regularization on neural network training and model compression. By applying nuclear norm, first- and second-order derivative penalties of the weights during training, we encourage structured smoothness while preserving predictive performance on par with non-smooth models. We find that standard pruning methods often perform better when applied to these smooth models. Building on this observation, we apply a Singular-Value-Decomposition-based compression method that exploits the underlying smooth structure and approximates the model's weight tensors by smaller low-rank tensors. Our approach enables state-of-the-art compression without any fine-tuning - reaching up to $91\%$ accuracy on a smooth ResNet-18 on CIFAR-10 with $70\%$ fewer parameters.

Related papers

TuneComp: Joint Fine-tuning and Compression for Large Foundation Models [50.33925662486034]
sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step.<n>We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure.<n> Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.
arXiv Detail & Related papers (2025-05-27T23:49:35Z)
Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP)<n>ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run.<n>We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z)
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning [20.62274005080048]
PruneNet is a novel model compression method that reformulates model pruning as a policy learning process.<n>It can compress the LLaMA-2-7B model in just 15 minutes, achieving over 80% retention of its zero-shot performance.<n>On complex multitask language understanding tasks, PruneNet demonstrates its robustness by preserving up to 80% performance of the original model.
arXiv Detail & Related papers (2025-01-25T18:26:39Z)
Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement [9.454314879815337]
generative models often exhibit dominant singular vectors, hindering fine-tuning efficiency and leading to suboptimal performance.<n>We introduce Singular Value Scaling (SVS), a versatile technique for refining pruned weights, applicable to both model types.<n>SVS improves compression performance across model types without additional training costs.
arXiv Detail & Related papers (2024-12-23T08:40:08Z)
Convolutional Neural Network Compression Based on Low-Rank Decomposition [3.3295360710329738]
This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization. VBMF is employed to estimate the rank of the weight tensor at each layer. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance.
arXiv Detail & Related papers (2024-08-29T06:40:34Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way. CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning. CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.