In defense of parameter sharing for model-compression
- URL: http://arxiv.org/abs/2310.11611v1
- Date: Tue, 17 Oct 2023 22:08:01 GMT
- Title: In defense of parameter sharing for model-compression
- Authors: Aditya Desai, Anshumali Shrivastava
- Abstract summary: randomized parameter-sharing (RPS) methods have gained traction for model compression at start of training.
RPS consistently outperforms/matches smaller models and all moderately informed pruning strategies.
This paper argues in favor of paradigm shift towards RPS based models.
- Score: 38.80110838121722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When considering a model architecture, there are several ways to reduce its
memory footprint. Historically, popular approaches included selecting smaller
architectures and creating sparse networks through pruning. More recently,
randomized parameter-sharing (RPS) methods have gained traction for model
compression at start of training. In this paper, we comprehensively assess the
trade-off between memory and accuracy across RPS, pruning techniques, and
building smaller models. Our findings demonstrate that RPS, which is both data
and model-agnostic, consistently outperforms/matches smaller models and all
moderately informed pruning strategies, such as MAG, SNIP, SYNFLOW, and GRASP,
across the entire compression range. This advantage becomes particularly
pronounced in higher compression scenarios. Notably, even when compared to
highly informed pruning techniques like Lottery Ticket Rewinding (LTR), RPS
exhibits superior performance in high compression settings. This points out
inherent capacity advantage that RPS enjoys over sparse models. Theoretically,
we establish RPS as a superior technique in terms of memory-efficient
representation when compared to pruning for linear models. This paper argues in
favor of paradigm shift towards RPS based models. During our rigorous
evaluation of RPS, we identified issues in the state-of-the-art RPS technique
ROAST, specifically regarding stability (ROAST's sensitivity to initialization
hyperparameters, often leading to divergence) and Pareto-continuity (ROAST's
inability to recover the accuracy of the original model at zero compression).
We provably address both of these issues. We refer to the modified RPS, which
incorporates our improvements, as STABLE-RPS.
Related papers
- Choose Your Model Size: Any Compression by a Single Gradient Descent [9.074689052563878]
We present Any Compression via Iterative Pruning (ACIP)
ACIP is an algorithmic approach to determine a compression-performance trade-off from a single gradient descent run.
We show that ACIP seamlessly complements common quantization-based compression techniques.
arXiv Detail & Related papers (2025-02-03T18:40:58Z) - You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning [20.62274005080048]
PruneNet is a novel model compression method that reformulates model pruning as a policy learning process.
It can compress the LLaMA-2-7B model in just 15 minutes, achieving over 80% retention of its zero-shot performance.
On complex multitask language understanding tasks, PruneNet demonstrates its robustness by preserving up to 80% performance of the original model.
arXiv Detail & Related papers (2025-01-25T18:26:39Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Learning Parameter Sharing with Tensor Decompositions and Sparsity [5.73573685846194]
This paper introduces Finegrained compress Sharing (FiPS), a novel algorithm to efficiently compress large vision transformer models.
FiPS employs a shared base and sparse factors to represent shared neurons across multi-layer perception modules.
Experiments demonstrate that FiPS compresses Dei-B and Swin-LTs to 25-40% of their original parameter count while maintaining accuracy within 1 percentage point of the original models.
arXiv Detail & Related papers (2024-11-14T21:29:58Z) - LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.
We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.
Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z) - Unified Low-rank Compression Framework for Click-through Rate Prediction [15.813889566241539]
We propose a unified low-rank decomposition framework for compressing CTR prediction models.
Our framework can achieve better performance than the original model.
Our framework can be applied to embedding tables and layers in various CTR prediction models.
arXiv Detail & Related papers (2024-05-28T13:06:32Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Revisiting RCAN: Improved Training for Image Super-Resolution [94.8765153437517]
We revisit the popular RCAN model and examine the effect of different training options in SR.
We show that RCAN can outperform or match nearly all the CNN-based SR architectures published after RCAN on standard benchmarks.
arXiv Detail & Related papers (2022-01-27T02:20:11Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.