Related papers: Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning

Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning

URL: http://arxiv.org/abs/2510.00570v1
Date: Wed, 01 Oct 2025 06:49:19 GMT
Title: Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Authors: Minghao Yang, Ren Togo, Guang Li, Takahiro Ogawa, Miki Haseyama,
Abstract summary: Mixture-of-Experts (MoE) has emerged as a powerful framework for multi-task learning (MTL)<n>Existing MoE-MTL methods often rely on single-task pretrained backbones and suffer from redundant adaptation and inefficient knowledge sharing.<n>We propose adaptive shared experts (ASE) within a low-rank adaptation (LoRA) based MoE, where shared experts are assigned router-computed gating weights jointly normalized with sparse experts.
Score: 49.90176890917986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) has emerged as a powerful framework for multi-task learning (MTL). However, existing MoE-MTL methods often rely on single-task pretrained backbones and suffer from redundant adaptation and inefficient knowledge sharing during the transition from single-task to multi-task learning (STL to MTL). To address these limitations, we propose adaptive shared experts (ASE) within a low-rank adaptation (LoRA) based MoE, where shared experts are assigned router-computed gating weights jointly normalized with sparse experts. This design facilitates STL to MTL transition, enhances expert specialization, and cooperation. Furthermore, we incorporate fine-grained experts by increasing the number of LoRA experts while proportionally reducing their rank, enabling more effective knowledge sharing under a comparable parameter budget. Extensive experiments on the PASCAL-Context benchmark, under unified training settings, demonstrate that ASE consistently improves performance across diverse configurations and validates the effectiveness of fine-grained designs for MTL.

Related papers

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning [83.66308307152808]
We propose StAbilized Mixture-of-Experts (SAME) for Multimodal Continual Instruction Tuning (MCIT)<n>SAME stabilizes expert selection by decomposing routing dynamics into subspaces and updating only task-relevant directions.<n>It also introduces adaptive expert activation to freeze selected experts during training, reducing redundant and cross-task interference.
arXiv Detail & Related papers (2026-02-02T11:47:06Z)
Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts [22.936728143586443]
Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization.<n>We introduce a novel Fine-Grained Mixture of Experts architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning.
arXiv Detail & Related papers (2025-07-25T08:59:30Z)
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition [16.14787920254091]
We present a novel Multimodal Mixture of Low-Rank Experts (MMoLRE) method for multimodal sentiment analysis (MSA) and multimodal emotion recognition (MER)<n>MMoLRE utilizes shared and task-specific experts to distinctly model common and unique task characteristics, thereby avoiding parameter conflicts.<n>Experiments on the CMU-MOSI and CMU-MOSEI benchmarks demonstrate that MMoLRE achieves state-of-the-art performance on the MSA task and competitive results on the MER task.
arXiv Detail & Related papers (2025-05-20T09:46:56Z)
Collaborative Multi-LoRA Experts with Achievement-based Multi-Tasks Loss for Unified Multimodal Information Extraction [28.800518091590117]
Multimodal Information Extraction (MIE) has gained attention for extracting structured information from multimedia sources.<n>Traditional methods tackle MIE tasks separately, missing opportunities to share knowledge across tasks.<n>We propose collaborative multi-LoRA experts with achievement-based multi-task loss for MIE tasks.
arXiv Detail & Related papers (2025-05-08T03:16:32Z)
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities.<n> MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge.<n>This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z)
TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition [61.91764883512776]
We introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts. By doing so, TeamLoRA connects the experts as a "Team" with internal collaboration and competition, enabling a faster and more accurate PEFT paradigm for multi-task learning.
arXiv Detail & Related papers (2024-08-19T09:58:53Z)
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts [74.40198929049959]
Large multi-modal models (LMMs) exhibit remarkable performance across numerous tasks. generalist LMMs often suffer from performance degradation when tuned over a large collection of tasks. We propose Omni-SMoLA, an architecture that uses the Soft MoE approach to mix many multimodal low rank experts.
arXiv Detail & Related papers (2023-12-01T23:04:27Z)
Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks. Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts. We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.