LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models
via MoE-Style Plugin
- URL: http://arxiv.org/abs/2312.09979v4
- Date: Fri, 8 Mar 2024 13:13:54 GMT
- Title: LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models
via MoE-Style Plugin
- Authors: Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen,
Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui
Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
- Abstract summary: We propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network.
It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks.
Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks.
- Score: 85.16356890023582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised fine-tuning (SFT) is a crucial step for large language models
(LLMs), enabling them to align with human instructions and enhance their
capabilities in downstream tasks. Increasing instruction data substantially is
a direct solution to align the model with a broader range of downstream tasks
or notably improve its performance on a specific task. However, we find that
large-scale increases in instruction data can damage the world knowledge
previously stored in LLMs. To address this challenge, we propose LoRAMoE, a
novelty framework that introduces several low-rank adapters (LoRA) and
integrates them by using a router network, like a plugin version of Mixture of
Experts (MoE). It freezes the backbone model and forces a portion of LoRAs to
focus on leveraging world knowledge to solve downstream tasks, to alleviate
world knowledge-edge forgetting. Experimental results show that, as the
instruction data increases, LoRAMoE can significantly improve the ability to
process downstream tasks, while maintaining the world knowledge stored in the
LLM.
Related papers
- MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network [4.396837128416218]
We propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN)
MLAAN captures both local and global features through independent and cascaded auxiliary networks.
We further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules.
Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks.
arXiv Detail & Related papers (2024-06-24T13:30:55Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes.
This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [83.00018517368973]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning.
negative conflicts and interference may have a worse impact on performance.
We combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning.
arXiv Detail & Related papers (2023-11-05T15:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.