Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
- URL: http://arxiv.org/abs/2310.08915v3
- Date: Mon, 26 Feb 2024 02:51:30 GMT
- Title: Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
- Authors: Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia
Han, Jared Tanner, Shiwei Liu, Rongrong Ji
- Abstract summary: We introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach to fine-tune large language models (LLMs)
Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs.
Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs.
- Score: 67.38165028487242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ever-increasing large language models (LLMs), though opening a potential
path for the upcoming artificial general intelligence, sadly drops a daunting
obstacle on the way towards their on-device deployment. As one of the most
well-established pre-LLMs approaches in reducing model complexity, network
pruning appears to lag behind in the era of LLMs, due mostly to its costly
fine-tuning (or re-training) necessity under the massive volumes of model
parameter and training data. To close this industry-academia gap, we introduce
Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that
slightly updates sparse LLMs without the expensive backpropagation and any
weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the
reconstruction error between the dense and sparse LLMs, in the fashion of
performing iterative weight pruning-and-growing on top of sparse LLMs. To
accomplish this purpose, DSnoT particularly takes into account the anticipated
reduction in reconstruction error for pruning and growing, as well as the
variance w.r.t. different input data for growing each weight. This practice can
be executed efficiently in linear time since its obviates the need of
backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2,
Vicuna, and OPT across various benchmarks demonstrate the effectiveness of
DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity
levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by
26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights
into how to fine-tune sparse LLMs in an efficient training-free manner and open
new venues to scale the great potential of sparsity to LLMs. Codes are
available at https://github.com/zyxxmu/DSnoT.
Related papers
- Pruning Foundation Models for High Accuracy without Retraining [48.256389781305415]
It is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations.
Post-training pruning methods are proposed to prune LLMs in one-shot without retraining.
Our experiments demonstrate the superior performance of the proposed methods in comparison to SOTA baselines.
arXiv Detail & Related papers (2024-10-21T01:23:34Z) - Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation [9.506166330956082]
We propose a training metric for SFT to measure the discrepancy between the optimized model and the original model, and a loss function MinorSFT that can increase the training effectiveness.
In this article with insight from DPO and MinorDPO, we propose a training metric for SFT to measure the discrepancy between the optimized model and the original model, and a loss function MinorSFT that can increase the training effectiveness.
arXiv Detail & Related papers (2024-08-20T08:32:44Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models [53.638791265113625]
Sparsity-Preserved efficient fine-tuning method for large language models.
Code will be made available at https://github.com/Lucky-Lance/SPP.
arXiv Detail & Related papers (2024-05-25T04:55:27Z) - Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models [21.864109456867784]
For many downstream tasks, it is necessary to fine-tune large language models (LLMs) using private data.
We propose an automated federated pipeline, named FedPipe, to fine-tune LLMs with minimal training cost.
Extensive experiments demonstrate that FedPipe expedites the model training and achieves higher accuracy than state-of-the-art benchmarks.
arXiv Detail & Related papers (2024-04-09T16:50:30Z) - LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks.
We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset.
We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.