LLM-Pruner: On the Structural Pruning of Large Language Models
- URL: http://arxiv.org/abs/2305.11627v3
- Date: Thu, 28 Sep 2023 03:59:27 GMT
- Title: LLM-Pruner: On the Structural Pruning of Large Language Models
- Authors: Xinyin Ma, Gongfan Fang, Xinchao Wang
- Abstract summary: Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
- Score: 65.02607075556742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically
comes with a substantial model size, which presents significant challenges in
both the deployment, inference, and training stages. With LLM being a
general-purpose task solver, we explore its compression in a task-agnostic
manner, which aims to preserve the multi-task solving and language generation
ability of the original LLM. One challenge to achieving this is the enormous
size of the training corpus of LLM, which makes both data transfer and model
post-training over-burdensome. Thus, we tackle the compression of LLMs within
the bound of two constraints: being task-agnostic and minimizing the reliance
on the original training dataset. Our method, named LLM-Pruner, adopts
structural pruning that selectively removes non-critical coupled structures
based on gradient information, maximally preserving the majority of the LLM's
functionality. To this end, the performance of pruned models can be efficiently
recovered through tuning techniques, LoRA, in merely 3 hours, requiring only
50K data. We validate the LLM-Pruner on three LLMs, including LLaMA, Vicuna,
and ChatGLM, and demonstrate that the compressed models still exhibit
satisfactory capabilities in zero-shot classification and generation. The code
is available at: https://github.com/horseee/LLM-Pruner
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources.
Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning.
We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output.
Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
arXiv Detail & Related papers (2023-05-22T22:07:50Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.