Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning
- URL: http://arxiv.org/abs/2508.07738v1
- Date: Mon, 11 Aug 2025 08:18:22 GMT
- Title: Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning
- Authors: Jialu Zhou, Dianxi Shi, Shaowu Yang, Xinyu Wei, Mingyue Yang, Leqian Li, Mengzhu Wang, Chunping Qiu,
- Abstract summary: We propose a Two-Level Grouped Mixture Routing-of-Experts (TRGE) method to mitigate catastrophic forgetting.<n> TRGE dynamically expands the pre-trained CLIP model, assigning specific expert group for each task.<n>We leverage Multimodal Large Language Models (MLLMs) which own powerful multimodal comprehension capabilities to generate task descriptions and recognize the correct task identifier.
- Score: 7.361665112773847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Domain Continual Learning (MDCL) acquires knowledge from sequential tasks with shifting class sets and distribution. Despite the Parameter-Efficient Fine-Tuning (PEFT) methods can adapt for this dual heterogeneity, they still suffer from catastrophic forgetting and forward forgetting. To address these challenges, we propose a Two-Level Routing Grouped Mixture-of-Experts (TRGE) method. Firstly, TRGE dynamically expands the pre-trained CLIP model, assigning specific expert group for each task to mitigate catastrophic forgetting. With the number of experts continually grows in this process, TRGE maintains the static experts count within the group and introduces the intra-group router to alleviate routing overfitting caused by the increasing routing complexity. Meanwhile, we design an inter-group routing policy based on task identifiers and task prototype distance, which dynamically selects relevant expert groups and combines their outputs to enhance inter-task collaboration. Secondly, to get the correct task identifiers, we leverage Multimodal Large Language Models (MLLMs) which own powerful multimodal comprehension capabilities to generate semantic task descriptions and recognize the correct task identifier. Finally, to mitigate forward forgetting, we dynamically fuse outputs for unseen samples from the frozen CLIP model and TRGE adapter based on training progress, leveraging both pre-trained and learned knowledge. Through extensive experiments across various settings, our method outperforms other advanced methods with fewer trainable parameters.
Related papers
- Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts [56.02203242609604]
Large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks.<n>Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies.<n>We empirically evaluate their trade-offs, addressing two key questions: What are the advantages of going beyond uniform ensembling or merging? and does the flexibility of routing justify its complexity?
arXiv Detail & Related papers (2026-03-03T21:44:11Z) - Token-Level LLM Collaboration via FusionRoute [60.72307345997823]
FusionRoute is a token-level multi-LLM collaboration framework.<n>It selects the most suitable expert at each decoding step and contributes a complementary logit that refines or corrects the selected expert's next-token distribution.<n>It outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning.
arXiv Detail & Related papers (2026-01-08T16:53:16Z) - CKAA: Cross-subspace Knowledge Alignment and Aggregation for Robust Continual Learning [80.18781219542016]
Continual Learning (CL) empowers AI models to continuously learn from sequential task streams.<n>Recent parameter-efficient fine-tuning (PEFT)-based CL methods have garnered increasing attention due to their superior performance.<n>We propose Cross-subspace Knowledge Alignment and Aggregation (CKAA) to enhance robustness against misleading task-ids.
arXiv Detail & Related papers (2025-07-13T03:11:35Z) - Mixture-of-Experts Meets In-Context Reinforcement Learning [49.19791753312034]
In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks.<n>We propose T2MIR, an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models.<n>We show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines.
arXiv Detail & Related papers (2025-06-05T06:29:14Z) - Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM.<n>In stage 1, we extract task-specific features and client-specific features from visual information.<n>In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z) - DMTG: One-Shot Differentiable Multi-Task Grouping [32.72240053032646]
We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG)
We propose to simultaneously identify the best task groups from 2N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited.
arXiv Detail & Related papers (2024-07-06T13:54:00Z) - Variational Offline Multi-agent Skill Discovery [47.924414207796005]
We propose two novel auto-encoder schemes to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills.<n>Our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining.<n> Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods.
arXiv Detail & Related papers (2024-05-26T00:24:46Z) - Multi-task learning via robust regularized clustering with non-convex group penalties [0.0]
Multi-task learning (MTL) aims to improve estimation performance by sharing common information among related tasks.
Existing MTL methods based on this assumption often ignore outlier tasks.
We propose a novel MTL method called MultiTask Regularized Clustering (MTLRRC)
arXiv Detail & Related papers (2024-04-04T07:09:43Z) - MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning [29.88567810099265]
Multi-task learning is designed to train multiple correlated tasks simultaneously.
To tackle this challenge, we integrate the decoder-free vision-language model CLIP.
We propose Multi-modal Alignment Prompt (MmAP) for CLIP, which aligns text and visual modalities during fine-tuning process.
arXiv Detail & Related papers (2023-12-14T03:33:02Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Semi-supervised Multi-task Learning for Semantics and Depth [88.77716991603252]
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.
We propose the Semi-supervised Multi-Task Learning (MTL) method to leverage the available supervisory signals from different datasets.
We present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets.
arXiv Detail & Related papers (2021-10-14T07:43:39Z) - Multi-task Over-the-Air Federated Learning: A Non-Orthogonal
Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES)
Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.