Related papers: MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

URL: http://arxiv.org/abs/2412.07405v1
Date: Tue, 10 Dec 2024 10:55:57 GMT
Title: MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning
Authors: Yufei Ma, Zihan Liang, Huangyu Dai, Ben Chen, Dehong Gao, Zhuoran Ran, Wang Zihan, Linbo Jin, Wen Jiang, Guannan Zhang, Xiaoyan Cai, Libin Yang,
Abstract summary: MoDULA is a paradigm for improved fine-tuning and parameter efficiency in multi-task learning.<n>MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts.
Score: 17.960185808572582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

Related papers

Bagging-Based Model Merging for Robust General Text Embeddings [73.51674133699196]
General-purpose text embedding models underpin a wide range of NLP and information retrieval applications.<n>We present a systematic study of multi-task training for text embeddings from two perspectives: data scheduling and model merging.<n>We propose Bagging-based rObust mOdel Merging (BOOM), which trains multiple embedding models on sampled subsets and merges them into a single model.
arXiv Detail & Related papers (2026-02-05T15:45:08Z)
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models [52.876185634349575]
We propose to incorporate Mixture of Intra- and Inter-Modality Experts (MoIIE) to Large Vision-Language Models (LVLMs)<n>For each token, expert routing is guided by its modality, directing tokens to their respective intra-modality experts as well as a shared pool of inter-modality experts.<n>Our MoIIE models with 5.5B and 11.3B activated parameters match or even surpass the performance of existing advanced open-source MoE-LLMs based multi-modal models.
arXiv Detail & Related papers (2025-08-13T13:00:05Z)
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning [54.73049408950049]
We propose a Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning.<n>Our approach improves unified multimodal retrieval from both structural and learning perspectives.
arXiv Detail & Related papers (2025-07-10T16:47:25Z)
MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver [9.61561012521585]
This work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD)<n>The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, label-free training and effectively improving the model's generalization ability across diverse tasks.<n> Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks.
arXiv Detail & Related papers (2025-06-03T14:35:36Z)
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs [21.77261258691006]
Large language models (LLMs) can effectively combine diverse strengths of different models, offering a promising approach to enhance performance across various tasks.<n>We propose Reinforcement Learning-Assisted Ensemble for LLMs, a novel framework that reformulates ensemble through the lens of a Markov Decision Process (MDP)<n>Our approach introduces a RL agent that dynamically adjusts ensemble weights by considering both input context and intermediate generation states.
arXiv Detail & Related papers (2025-05-31T07:38:41Z)
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer [56.898822179122476]
We propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model's parameter scalability.<n> Experimental results show that, by increasing the number of experts, M3DT not only consistently enhances its performance as model expansion on the fixed task numbers, but also exhibits remarkable task scalability, successfully extending to 160 tasks with superior performance.
arXiv Detail & Related papers (2025-05-30T09:08:52Z)
Large Language Models as Attribution Regularizers for Efficient Model Training [0.0]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. We introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Our approach yields superior performance in few-shot learning scenarios.
arXiv Detail & Related papers (2025-02-27T16:55:18Z)
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging [111.8456671452411]
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and facilitate knowledge transfer. We propose a Weight-Ensembling Mixture of Experts (WEMoE) method for multi-task model merging. We show that WEMoE and E-WEMoE outperform state-of-the-art (SOTA) model merging methods in terms of MTL performance, generalization, and robustness.
arXiv Detail & Related papers (2024-10-29T07:16:31Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training [15.462969044840868]
We introduce LW-FedMML, a layer-wise federated multimodal learning approach which decomposes the training process into multiple stages. We conduct extensive experiments across various FL and multimodal learning settings to validate the effectiveness of our proposed method. Specifically, LW-FedMML reduces memory usage by up to $2.7times$, computational operations (FLOPs) by $2.4times$, and total communication cost by $2.3times$.
arXiv Detail & Related papers (2024-07-22T07:06:17Z)
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z)
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications. MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling. Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models [49.32669226551026]
We propose a simple yet effective training strategy MoE-Tuning for LVLMs. MoE-LLaVA, a MoE-based sparse LVLM architecture, uniquely activates only the top-k experts through routers. Experiments show the significant performance of MoE-LLaVA in a variety of visual understanding and object hallucination benchmarks.
arXiv Detail & Related papers (2024-01-29T08:13:40Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters. We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency. A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.