Reconstruct the Pruned Model without Any Retraining
- URL: http://arxiv.org/abs/2407.13331v1
- Date: Thu, 18 Jul 2024 09:30:44 GMT
- Title: Reconstruct the Pruned Model without Any Retraining
- Authors: Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang,
- Abstract summary: We introduce the Linear Interpolation-based Adaptive Reconstruction (LIAR) framework, which is both efficient and effective.
LIAR does not require back-propagation or retraining and is compatible with various pruning criteria and modules.
Our evaluations on benchmarks such as GLUE, SQuAD, WikiText, and common sense reasoning show that LIAR enables a BERT model to maintain 98% accuracy even after removing 50% of its parameters.
- Score: 23.235907813011174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while using reconstruction techniques that are specific to certain modules or criteria, resulting in limited generalizability. To address this, we introduce the Linear Interpolation-based Adaptive Reconstruction (LIAR) framework, which is both efficient and effective. LIAR does not require back-propagation or retraining and is compatible with various pruning criteria and modules. By applying linear interpolation to the preserved weights, LIAR minimizes reconstruction error and effectively reconstructs the pruned output. Our evaluations on benchmarks such as GLUE, SQuAD, WikiText, and common sense reasoning show that LIAR enables a BERT model to maintain 98% accuracy even after removing 50% of its parameters and achieves top performance for LLaMA in just a few minutes.
Related papers
- Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration [109.38288333994407]
Contrastive Prompt Learning (CPL) is a novel framework that fundamentally enhances prompt-task alignment.
Our framework establishes new state-of-the-art performance while maintaining parameter efficiency, offering a principled solution for unified image restoration.
arXiv Detail & Related papers (2025-04-14T08:24:57Z) - Sample-aware Adaptive Structured Pruning for Large Language Models [14.605017410864583]
This study introduces AdaPruner, a sample-aware adaptive structured pruning framework for large language models (LLMs)
Specifically, AdaPruner effectively removes redundant parameters from LLMs by constructing a structured pruning solution space.
At a 20% pruning ratio, the model pruned with AdaPruner maintains 97% of the performance of the unpruned model.
arXiv Detail & Related papers (2025-03-08T12:00:21Z) - Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.
We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z) - A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models [24.185245582500876]
We introduce FISTAPruner, the first post-training pruner based on convex optimization models and algorithms.
FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning.
We evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under unstructured and 2:4 semi-structured sparsity.
arXiv Detail & Related papers (2024-08-07T12:33:46Z) - Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining [16.026565606764954]
We simplify the pruning process for Transformer-based large language models (LLMs)
We propose two inference-aware pruning criteria derived from the optimization perspective of output approximation.
We also introduce a two-step reconstruction technique to mitigate pruning errors without model retraining.
arXiv Detail & Related papers (2024-07-26T23:53:59Z) - Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization [18.24882084542254]
We present an array of reconstruction techniques that can significantly reduce this error by more than $90%$.
We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization.
arXiv Detail & Related papers (2024-06-21T05:13:34Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Structurally Prune Anything: Any Architecture, Any Framework, Any Time [84.6210631783801]
We introduce Structurally Prune Anything (SPA), a versatile structured pruning framework for neural networks.
SPA supports pruning at any time, either before training, after training with fine-tuning, or after training without fine-tuning.
In extensive experiments, SPA shows competitive to state-of-the-art pruning performance across various architectures.
arXiv Detail & Related papers (2024-03-03T13:49:49Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs [22.557682089926004]
We show that updating a small subset of parameters can suffice to recover or even enhance performance after pruning.
We introduce two novel LoRA variants that, unlike standard LoRA, allow merging adapters back without compromising sparsity.
arXiv Detail & Related papers (2023-12-23T11:45:22Z) - How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
May Cry'' Benchmark [60.72725673114168]
We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets.
We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
arXiv Detail & Related papers (2023-12-21T03:11:30Z) - Fluctuation-based Adaptive Structured Pruning for Large Language Models [44.217363567065]
FLAP (FLuctuation-based Adaptive Structured Pruning) is a retraining-free structured pruning framework for Large Language Models.
It is hardware-friendly by effectively reducing storage and enhancing inference speed.
arXiv Detail & Related papers (2023-12-19T09:23:48Z) - Rethinking the optimization process for self-supervised model-driven MRI
reconstruction [16.5013498806588]
K2Calibrate is a K-space adaptation strategy for self-supervised model-driven MR reconstruction optimization.
It can reduce the network's reconstruction deterioration caused by statistically dependent noise.
It achieves better results than five state-of-the-art methods.
arXiv Detail & Related papers (2022-03-18T03:41:36Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.