Related papers: LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

URL: http://arxiv.org/abs/2312.09979v4
Date: Fri, 8 Mar 2024 13:13:54 GMT
Title: LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
Authors: Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
Abstract summary: We propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network. It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks. Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks.
Score: 85.16356890023582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increases in instruction data can damage the world knowledge previously stored in LLMs. To address this challenge, we propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network, like a plugin version of Mixture of Experts (MoE). It freezes the backbone model and forces a portion of LoRAs to focus on leveraging world knowledge to solve downstream tasks, to alleviate world knowledge-edge forgetting. Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks, while maintaining the world knowledge stored in the LLM.

Related papers

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation [24.200547898713126]
Multimodal Large Language Models (MLLMs) excel in understanding complex language and visual data. Their real-world deployment is hindered by substantial computational and storage demands. We propose a Mixture-of-Layers Vision-Language-Action model (MoLe) architecture for dynamic LLM layer activation.
arXiv Detail & Related papers (2025-03-26T10:05:38Z)
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [27.812611421754482]
We propose an MLLMs-based dual momentum Mixture-of-Experts (CL-MoE) framework for continual visual question answering (VQA) We integrate MLLMs with continual learning to utilize the rich commonsense knowledge in LLMs. Our method achieves state-of-the-art performance on 10 VQA tasks, proving the effectiveness of our approach.
arXiv Detail & Related papers (2025-03-01T09:25:23Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network [4.396837128416218]
We propose the Multilaminar Leap Augmented Auxiliary Network (MLAAN) MLAAN captures both local and global features through independent and cascaded auxiliary networks. We further design LAM, an enhanced auxiliary network that uses the Exponential Moving Average (EMA) method to facilitate information exchange between local modules. Our experiments on the CIFAR-10, STL-10, SVHN, and ImageNet datasets show that MLAAN can be seamlessly integrated into existing local learning frameworks.
arXiv Detail & Related papers (2024-06-24T13:30:55Z)
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module. Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models [90.14693869269519]
MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. This paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
arXiv Detail & Related papers (2024-02-22T18:56:07Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [83.00018517368973]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning. negative conflicts and interference may have a worse impact on performance. We combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning.
arXiv Detail & Related papers (2023-11-05T15:48:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.