CPET: Effective Parameter-Efficient Tuning for Compressed Large Language
Models
- URL: http://arxiv.org/abs/2307.07705v2
- Date: Wed, 15 Nov 2023 17:02:17 GMT
- Title: CPET: Effective Parameter-Efficient Tuning for Compressed Large Language
Models
- Authors: Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang,
Maosong Sun
- Abstract summary: We propose an effective PET framework based on compressed language models (LLMs)
We evaluate the impact of mainstream compression techniques on PET performance.
We then introduce knowledge inheritance and recovery strategies to restore the knowledge loss caused by these compression techniques.
- Score: 86.63730733413796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient tuning (PET) has been widely explored in recent years
because it tunes much fewer parameters (PET modules) than full-parameter
fine-tuning (FT) while still stimulating sufficient knowledge from large
language models (LLMs) for downstream tasks. Moreover, when PET is employed to
serve multiple tasks, different task-specific PET modules can be built on a
frozen LLM, avoiding redundant LLM deployments. Although PET significantly
reduces the cost of tuning and deploying LLMs, its inference still suffers from
the computational bottleneck of LLMs. To address the above issue, we propose an
effective PET framework based on compressed LLMs, named "CPET". In CPET, we
evaluate the impact of mainstream LLM compression techniques on PET performance
and then introduce knowledge inheritance and recovery strategies to restore the
knowledge loss caused by these compression techniques. Our experimental results
demonstrate that, owing to the restoring strategies of CPET, collaborating
task-specific PET modules with a compressed LLM can achieve comparable
performance to collaborating PET modules with the original version of the
compressed LLM and outperform directly applying vanilla PET methods to the
compressed LLM.
Related papers
- TeleLoRA: Teleporting Model-Specific Alignment Across LLMs [13.551164842422484]
TeleLoRA is a framework that synergizes model-specific alignment data across multiple Large Language Models.
It learns a unified generator of LoRA adapter weights by leveraging local activation information across multiple LLMs.
Experiments on LLM Trojan mitigation benchmarks demonstrate that TeleLoRA effectively reduces attack success rates.
arXiv Detail & Related papers (2025-03-26T04:46:31Z) - Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning [57.36978335727009]
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs)
In this paper, we propose a framework that adaptively retrieves and composes multiple LoRAs based on input prompts.
arXiv Detail & Related papers (2024-06-24T05:24:41Z) - Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead [41.31302904190149]
Fine-tuning large language models with low-rank adaptations (LoRAs) has become common practice.
We propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices.
Experiments with up to 500 LoRAs demonstrate that compressed LoRAs preserve performance while offering major throughput gains.
arXiv Detail & Related papers (2024-06-17T15:21:35Z) - LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report [3.304521604464247]
Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for.
Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs)
We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications.
arXiv Detail & Related papers (2024-04-29T04:01:45Z) - LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative
Tasks [72.88244322513039]
LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain.
We propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs.
Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights.
arXiv Detail & Related papers (2024-02-18T04:41:25Z) - LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed
Tasks in the Wild [76.67343971195267]
Low-Rank Adaptation (LoRA) provides an efficient solution for fine-tuning large language models (LLM)
LoraRetriever is a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts.
Experimental results indicate that LoraRetriever consistently outperforms the baselines.
arXiv Detail & Related papers (2024-02-15T15:02:46Z) - S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks.
We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z) - NOLA: Compressing LoRA using Linear Combination of Random Basis [22.76088132446952]
We introduce NOLA, which overcomes the rank one lower bound present in LoRA.
NOLA performs as well as LoRA models with much fewer number of parameters compared to LoRA with rank one, the best compression LoRA can archive.
arXiv Detail & Related papers (2023-10-04T03:30:24Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.