Related papers: MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning

MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning

URL: http://arxiv.org/abs/2506.11038v1
Date: Wed, 21 May 2025 03:06:10 GMT
Title: MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning
Authors: Linjie Li, Zhenyu Wu, Yang Ji,
Abstract summary: Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data.<n> prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment between tasks.<n>We propose a mixture of task-specific experts (MoTE) framework that effectively mitigates the miscalibration caused by inconsistent output dimensions.
Score: 39.892628170627496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data while preserving previously learned information. Recently, CIL based on pre-trained models (PTMs) has achieved remarkable success. However, prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment between tasks. While the idea of expert fusion in Mixture of Experts (MoE) can help address dimensional inconsistency, both expert and routing parameters are prone to being overwritten in dynamic environments, making MoE challenging to apply directly in CIL. To tackle these issues, we propose a mixture of task-specific experts (MoTE) framework that effectively mitigates the miscalibration caused by inconsistent output dimensions across tasks. Inspired by the weighted feature fusion and sparse activation mechanisms in MoE, we introduce task-aware expert filtering and reliable expert joint inference during the inference phase, mimicking the behavior of routing layers without inducing catastrophic forgetting. Extensive experiments demonstrate the superiority of our method without requiring an exemplar set. Furthermore, the number of tasks in MoTE scales linearly with the number of adapters. Building on this, we further explore the trade-off between adapter expansion and model performance and propose the Adapter-Limited MoTE. The code is available at https://github.com/Franklilinjie/MoTE.

Related papers

LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models [21.888139819188105]
LLaVA-CMoE is a continual learning framework for large language models.<n> Probe-Guided Knowledge Extension mechanism determines when and where new experts should be added.<n>Probabilistic Task Locator assigns each task a dedicated, lightweight router.
arXiv Detail & Related papers (2025-03-27T07:36:11Z)
MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning [62.78292142632335]
Class-Incremental Learning (CIL) requires models to continually acquire knowledge of new classes without forgetting old ones.<n>Existing work seeks to utilize lightweight components to adjust the model.<n>We propose MOdel Surgery (MOS) to rescue the model from forgetting previous knowledge.
arXiv Detail & Related papers (2024-12-12T16:57:20Z)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration [80.46313715427928]
We introduce complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields.<n>This preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity.<n>The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.
arXiv Detail & Related papers (2024-11-27T15:58:07Z)
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts [38.15244333975921]
MaskMoE is capable of maintaining representation diversity while achieving more comprehensive training. Our method outperforms previous dominant Mixture-of-Experts models in terms of both perplexity (PPL) and downstream task performance.
arXiv Detail & Related papers (2024-07-13T09:22:33Z)
Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters [36.09178055533487]
We leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training. We introduce two variants of MoLA, namely, MoLA-Grad and MoLA- SJ, to respectively handle the target-aware and target-agnostic scenarios. The latter uses a novel Task-wise Decorrelation (TwD) to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks.
arXiv Detail & Related papers (2024-06-14T03:04:05Z)
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE) MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z)
MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.