Related papers: Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement

Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement

URL: http://arxiv.org/abs/2412.17387v2
Date: Tue, 24 Dec 2024 07:52:45 GMT
Title: Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement
Authors: Hyeonjin Kim, Jaejun Yoo,
Abstract summary: generative models often exhibit dominant singular vectors, hindering fine-tuning efficiency and leading to suboptimal performance.<n>We introduce Singular Value Scaling (SVS), a versatile technique for refining pruned weights, applicable to both model types.<n>SVS improves compression performance across model types without additional training costs.
Score: 9.454314879815337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While pruning methods effectively maintain model performance without extra training costs, they often focus solely on preserving crucial connections, overlooking the impact of pruned weights on subsequent fine-tuning or distillation, leading to inefficiencies. Moreover, most compression techniques for generative models have been developed primarily for GANs, tailored to specific architectures like StyleGAN, and research into compressing Diffusion models has just begun. Even more, these methods are often applicable only to GANs or Diffusion models, highlighting the need for approaches that work across both model types. In this paper, we introduce Singular Value Scaling (SVS), a versatile technique for refining pruned weights, applicable to both model types. Our analysis reveals that pruned weights often exhibit dominant singular vectors, hindering fine-tuning efficiency and leading to suboptimal performance compared to random initialization. Our method enhances weight initialization by minimizing the disparities between singular values of pruned weights, thereby improving the fine-tuning process. This approach not only guides the compressed model toward superior solutions but also significantly speeds up fine-tuning. Extensive experiments on StyleGAN2, StyleGAN3 and DDPM demonstrate that SVS improves compression performance across model types without additional training costs. Our code is available at: https://github.com/LAIT-CVLab/Singular-Value-Scaling.

Related papers

Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP) ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run. We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z)
MultiPruner: Balanced Structure Removal in Foundation Models [1.8434042562191815]
Recently, state-of-the-art approaches for pruning large pre-trained models (LPMs) have demonstrated that the training-free removal of non-critical residual blocks in Transformers is viable for reducing model size. We extend BlockPruner and propose MultiPruner, a pruning approach that surpasses recent training-free pruning methods by adopting a multidimensional, iterative, fine-grained pruning strategy.
arXiv Detail & Related papers (2025-01-17T04:24:31Z)
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization [22.546989373687655]
We propose a novel pruning method that derives an efficient diffusion model via a more intelligent and differentiable pruner. Our approach achieves 4.4 x speedup for SD-1.5 without any loss of accuracy, significantly outperforming the previous state-of-the-art methods.
arXiv Detail & Related papers (2024-10-22T12:18:24Z)
Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model [20.054342930450055]
This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. PLRD allows for significant reductions in computational overhead and energy consumption. Our findings suggest that PLRD could set a new standard for the efficient scaling of LLMs.
arXiv Detail & Related papers (2024-06-28T15:27:57Z)
Model Stock: All we need is just a few fine-tuned models [34.449901046895185]
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. We employ significantly fewer models to achieve final weights yet yield superior accuracy. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures.
arXiv Detail & Related papers (2024-03-28T15:57:20Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings [6.463202903076821]
We compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited. Early-Exit provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach. We propose SWEET, an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights.
arXiv Detail & Related papers (2023-06-04T09:16:39Z)
Gradient-based Intra-attention Pruning on Pre-trained Language Models [21.444503777215637]
We propose a structured pruning method GRAIN (Gradient-based Intra-attention pruning) GRAIN inspects and prunes intra-attention structures, which greatly expands the structure search space and enables more flexible models. Experiments on GLUE, SQuAD, and CoNLL 2003 show that GRAIN notably outperforms other methods, especially in the high sparsity regime.
arXiv Detail & Related papers (2022-12-15T06:52:31Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Block Pruning For Faster Transformers [89.70392810063247]
We introduce a block pruning approach targeting both small and fast models. We find that this approach learns to prune out full components of the underlying model, such as attention heads.
arXiv Detail & Related papers (2021-09-10T12:46:32Z)
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights [53.8489656709356]
Normalization techniques are a boon for modern deep learning. It is often overlooked, however, that the additional introduction of momentum results in a rapid reduction in effective step sizes for scale-invariant weights. In this paper, we verify that the widely-adopted combination of the two ingredients lead to the premature decay of effective step sizes and sub-optimal model performances.
arXiv Detail & Related papers (2020-06-15T08:35:15Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
Training with Quantization Noise for Extreme Model Compression [57.51832088938618]
We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods.
arXiv Detail & Related papers (2020-04-15T20:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.