Related papers: Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning

Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning

URL: http://arxiv.org/abs/2506.21035v1
Date: Thu, 26 Jun 2025 06:19:05 GMT
Title: Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning
Authors: Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong,
Abstract summary: Continual learning with large pre-trained models is challenged by catastrophic forgetting and task interference.<n>Existing LoRA-based Mixture-of-Experts (MoE) approaches mitigate forgetting by assigning and freezing task-specific adapters.<n>We propose MoRA, a Mixture-of-Rank Adaptive learning approach with self-activated and sparse rank activation for CL.
Score: 19.982853959240497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual learning (CL) with large pre-trained models is challenged by catastrophic forgetting and task interference. Existing LoRA-based Mixture-of-Experts (MoE) approaches mitigate forgetting by assigning and freezing task-specific adapters, but suffer from interference, redundancy, and ambiguous routing due to coarse adapter-level selection. However, this design introduces three key challenges: 1) Interference: Activating full LoRA experts per input leads to subspace interference and prevents selective reuse of useful components across tasks. 2) Redundancy: Newly added experts often duplicate or contradict existing knowledge due to unnecessary activation of unrelated ranks and insufficient reuse of relevant ones. 3) Ambiguity: Overlapping features across tasks confuse the router, resulting in unstable expert assignments. As more experts accumulate, earlier task routing degrades, accelerating forgetting. We propose MoRA, a Mixture-of-Rank Adaptive learning approach with self-activated and sparse rank activation for CL. Unlike mixing multiple low-rank matrices, MoRA decomposes each rank-r update into r rank-1 components, each treated as an independent expert, enabling fine-grained mixture of rank-1 expert utilization while mitigating interference and redundancy. To avoid ambiguous routing, we propose that each rank-1 expert can infer its own relevance via intermediate activations. Coupled with our proposed rank pruning and activation budgets, MoRA adaptively selects a sparse mixture of ranks per input. We validate MoRA on continual learning tasks with CLIP and large language models (LLMs), analyzing both in-domain learning and out-of-domain forgetting/generalization during fine-tuning. MoRA shows significant effectiveness on enhancing CL with PTMs, and improving generalization while mitigating forgetting.

Related papers

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning [83.66308307152808]
We propose StAbilized Mixture-of-Experts (SAME) for Multimodal Continual Instruction Tuning (MCIT)<n>SAME stabilizes expert selection by decomposing routing dynamics into subspaces and updating only task-relevant directions.<n>It also introduces adaptive expert activation to freeze selected experts during training, reducing redundant and cross-task interference.
arXiv Detail & Related papers (2026-02-02T11:47:06Z)
Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA [50.97792275353563]
We introduce a novel framework that restructures a single Low-Rank Adaptation (LoRA) module as a decomposable Rank-1 Expert Pool.<n>Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [Guided] token.
arXiv Detail & Related papers (2026-01-30T10:54:51Z)
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation [26.24723718425076]
Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs)<n>We propose a Dynamic Rank LoRA framework named DR-LoRA, which dynamically grows expert LoRA ranks during fine-tuning based on task-specific demands.<n>Experiments on multiple benchmarks demonstrate that DR-LoRA consistently outperforms standard LoRA and static allocation strategies under the same parameter budget.
arXiv Detail & Related papers (2026-01-08T10:58:51Z)
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts [44.21416999726094]
Low-Rank Adaptation (LoRA) is a widely used parameter-efficient fine-tuning method for foundation models.<n>MoE-based LoRA variants show promise in mitigating intra-task correlations in single-task instruction tuning.<n>FlyLoRA is an implicit MoE-based LoRA variant that introduces rank-wise expert activation in the up-projection matrix.
arXiv Detail & Related papers (2025-10-09T16:17:13Z)
FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts [17.056585698418587]
Mixture of Experts (MoE) has been successfully integrated into Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning.<n>A key limitation of existing MoE-LoRA methods is their reliance on a discrete router.<n>We propose FURINA, a novel Free from Unmergeable Router framework based on the LINear Aggregation of experts.
arXiv Detail & Related papers (2025-09-18T12:22:32Z)
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection [85.0189917888094]
We propose a Dual-Stage Reweighted Mixture-of-Experts (DR-MoE) framework to handle the challenges posed by subtle and infrequent mistakes.<n>The proposed method achieves strong performance, particularly in identifying rare and ambiguous mistake instances.
arXiv Detail & Related papers (2025-09-16T12:00:42Z)
CKAA: Cross-subspace Knowledge Alignment and Aggregation for Robust Continual Learning [80.18781219542016]
Continual Learning (CL) empowers AI models to continuously learn from sequential task streams.<n>Recent parameter-efficient fine-tuning (PEFT)-based CL methods have garnered increasing attention due to their superior performance.<n>We propose Cross-subspace Knowledge Alignment and Aggregation (CKAA) to enhance robustness against misleading task-ids.
arXiv Detail & Related papers (2025-07-13T03:11:35Z)
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models [61.96237184081951]
Low-Rank Adaptation (LoRA) is widely used to efficiently acquire specialized knowledge in Multimodal Large Language Models (MLLMs)<n>LoRA introduces substantial harmful redundancy during visual instruction tuning, which exacerbates the forgetting of general knowledge and degrades downstream task performance.<n>We propose LoRASculpt to eliminate harmful redundant parameters, thereby harmonizing general and specialized knowledge.
arXiv Detail & Related papers (2025-03-21T04:31:09Z)
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning [53.053604713064544]
Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity.<n>Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules.<n>While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks.<n>We propose Single-ranked Mixture of Experts LoRA (textbfSMoRA), which embeds MoE into LoRA by textittreating each rank as an
arXiv Detail & Related papers (2025-01-25T06:56:39Z)
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning [29.957620178740186]
In multi-task scenarios, challenges such as training imbalance and the seesaw effect frequently emerge. We propose Mixture of Asymmetric Low-Rank Adaptaion (MALoRA) as a flexible fine-tuning framework. MALoRA reduces the number of trainable parameters by 30% to 48%, increases training speed by 1.2x, and matches the computational efficiency of single-task LoRA models.
arXiv Detail & Related papers (2024-10-30T07:53:52Z)
Learning Attentional Mixture of LoRAs for Language Model Continual Learning [5.405488709294211]
Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks. We propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs.
arXiv Detail & Related papers (2024-09-29T08:34:54Z)
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models [7.966452497550907]
We propose the Mixture-of-LoRAs (MoA) architecture for multi-task learning with large language models (LLMs) Multiple domain-specific LoRA modules can be aligned with the expert design principles observed in Mixture-of-Experts (MoE) Each LoRA model can be iteratively adapted to a new domain, allowing for quick domain-specific adaptation.
arXiv Detail & Related papers (2024-03-06T03:33:48Z)
Multimodal Instruction Tuning with Conditional Mixture of LoRA [51.58020580970644]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA)<n>It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance.<n> Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z)
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild [76.67343971195267]
Low-Rank Adaptation (LoRA) provides an efficient solution for fine-tuning large language models (LLM) LoraRetriever is a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts. Experimental results indicate that LoraRetriever consistently outperforms the baselines.
arXiv Detail & Related papers (2024-02-15T15:02:46Z)
Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback. We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.