LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models
via MoE-Style Plugin
- URL: http://arxiv.org/abs/2312.09979v4
- Date: Fri, 8 Mar 2024 13:13:54 GMT
- Title: LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models
via MoE-Style Plugin
- Authors: Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen,
Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui
Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
- Abstract summary: We propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network.
It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks.
Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks.
- Score: 85.16356890023582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised fine-tuning (SFT) is a crucial step for large language models
(LLMs), enabling them to align with human instructions and enhance their
capabilities in downstream tasks. Increasing instruction data substantially is
a direct solution to align the model with a broader range of downstream tasks
or notably improve its performance on a specific task. However, we find that
large-scale increases in instruction data can damage the world knowledge
previously stored in LLMs. To address this challenge, we propose LoRAMoE, a
novelty framework that introduces several low-rank adapters (LoRA) and
integrates them by using a router network, like a plugin version of Mixture of
Experts (MoE). It freezes the backbone model and forces a portion of LoRAs to
focus on leveraging world knowledge to solve downstream tasks, to alleviate
world knowledge-edge forgetting. Experimental results show that, as the
instruction data increases, LoRAMoE can significantly improve the ability to
process downstream tasks, while maintaining the world knowledge stored in the
LLM.
Related papers
- MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models [7.966452497550907]
We propose the Mixture-of-LoRAs (MoA) architecture for multi-task learning with large language models (LLMs)
Multiple domain-specific LoRA modules can be aligned with the expert design principles observed in Mixture-of-Experts (MoE)
Each LoRA model can be iteratively adapted to a new domain, allowing for quick domain-specific adaptation.
arXiv Detail & Related papers (2024-03-06T03:33:48Z) - Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes.
This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [85.76186554492543]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning.
negative conflicts and interference may have a worse impact on performance.
We propose a novel framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with MLLMs.
arXiv Detail & Related papers (2023-11-05T15:48:29Z) - The Web Can Be Your Oyster for Improving Large Language Models [98.72358969495835]
Large language models (LLMs) encode a large amount of world knowledge.
We consider augmenting LLMs with the large-scale web using search engine.
We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
arXiv Detail & Related papers (2023-05-18T14:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.