TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
- URL: http://arxiv.org/abs/2504.21190v1
- Date: Tue, 29 Apr 2025 21:46:43 GMT
- Title: TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
- Authors: Pradip Kunwar, Minh N. Vu, Maanak Gupta, Mahmoud Abdelsalam, Manish Bhattarai,
- Abstract summary: TT-LoRA MoE decomposes training into two distinct optimized stages.<n>First, we independently train lightweight, tensorized low-rank adapters (TT-LoRA experts)<n>Subsequently, these expert adapters remain frozen, eliminating inter-task interference and forgetting in multi-task setting.<n>A sparse MoE router, trained separately, dynamically leverages base model representations to select exactly one specialized adapter per input at inference time.<n> Comprehensive experiments confirm our architecture retains the memory efficiency of low-rank adapters, seamlessly scales to large expert pools, and achieves robust task-level optimization.
- Score: 4.5558042369389105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Tensor-Trained Low-Rank Adaptation Mixture of Experts (TT-LoRA MoE), a novel computational framework integrating Parameter-Efficient Fine-Tuning (PEFT) with sparse MoE routing to address scalability challenges in large model deployments. Unlike traditional MoE approaches, which face substantial computational overhead as expert counts grow, TT-LoRA MoE decomposes training into two distinct, optimized stages. First, we independently train lightweight, tensorized low-rank adapters (TT-LoRA experts), each specialized for specific tasks. Subsequently, these expert adapters remain frozen, eliminating inter-task interference and catastrophic forgetting in multi-task setting. A sparse MoE router, trained separately, dynamically leverages base model representations to select exactly one specialized adapter per input at inference time, automating expert selection without explicit task specification. Comprehensive experiments confirm our architecture retains the memory efficiency of low-rank adapters, seamlessly scales to large expert pools, and achieves robust task-level optimization. This structured decoupling significantly enhances computational efficiency and flexibility: uses only 2% of LoRA, 0.3% of Adapters and 0.03% of AdapterFusion parameters and outperforms AdapterFusion by 4 value in multi-tasking, enabling practical and scalable multi-task inference deployments.
Related papers
- Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning [5.074620301447097]
We propose a hierarchical scheme for expert allocation and rank configuration, HILO, for large language models (LLMs)<n>HILO dynamically adjusts the number and rank of adapter experts across layers, matching the varying representational complexity of model layers in adapter-granularity.<n>Experiments on multiple benchmark tasks demonstrate that HILO outperforms existing methods in accuracy while introducing fewer trainable parameters.
arXiv Detail & Related papers (2025-02-06T08:58:03Z) - Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM.<n>In stage 1, we extract task-specific features and client-specific features from visual information.<n>In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z) - Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously.<n>Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning.<n>We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z) - 3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability [6.451743797015637]
We introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt large language models (LLMs)
RoAd is remarkably parameter-efficient, delivering optimal performance on GLUE, eight commonsense reasoning tasks and four arithmetic reasoning tasks with $0.1%$ trainable parameters.
arXiv Detail & Related papers (2024-08-28T08:45:29Z) - MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts [6.245113492272563]
Mixture of Dyadic Experts (MoDE) is a novel design for efficient multi-task adaptation.
Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks.
arXiv Detail & Related papers (2024-08-02T18:05:10Z) - Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
We introduce PaLoRA, a novel parameter-efficient method that addresses multi-task trade-offs in machine learning.<n>Our experiments show that PaLoRA outperforms state-of-the-art MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning [12.648711621637663]
This paper introduces a novel.
COCO-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models.
We propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain.
Our method is evaluated on the captioning task, where it outperforms full fine-tuning under similar data constraints.
arXiv Detail & Related papers (2023-12-14T13:00:24Z) - VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense
Scene Understanding [6.816428690763012]
A standard approach to leverage large-scale pre-trained models is to fine-tune all model parameters for downstream tasks.
We propose VMT-Adapter, which shares knowledge from multiple tasks to enhance cross-task interaction.
We also propose VMT-Adapter-Lite, which further reduces the trainable parameters by learning shared parameters between down- and up-projections.
arXiv Detail & Related papers (2023-12-14T08:25:04Z) - Multi-Head Adapter Routing for Cross-Task Generalization [56.75667096355806]
Polytropon learns an inventory of adapters and a routing function that selects a subset of adapters for each task during both pre-training and few-shot adaptation.
We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation.
arXiv Detail & Related papers (2022-11-07T19:35:55Z) - M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly.
Current MTL regimes have to activate nearly the entire model even to just execute a single task.
We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.