Mixture of Experts Meets Prompt-Based Continual Learning
- URL: http://arxiv.org/abs/2405.14124v3
- Date: Sun, 17 Nov 2024 19:36:09 GMT
- Title: Mixture of Experts Meets Prompt-Based Continual Learning
- Authors: Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van Ngo, Nhat Ho,
- Abstract summary: This paper conducts a theoretical analysis to unravel how prompts bestow such advantages in continual learning.
We provide a novel view on prefix tuning, reframing it as the addition of new task-specific experts, thereby inspiring the design of a novel gating mechanism.
The effectiveness of NoRGa is substantiated both theoretically and empirically across diverse benchmarks and pretraining paradigms.
- Score: 23.376460019465235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer. While existing prompt-based continual learning methods excel in leveraging prompts for state-of-the-art performance, they often lack a theoretical explanation for the effectiveness of prompting. This paper conducts a theoretical analysis to unravel how prompts bestow such advantages in continual learning, thus offering a new perspective on prompt design. We first show that the attention block of pre-trained models like Vision Transformers inherently encodes a special mixture of experts architecture, characterized by linear experts and quadratic gating score functions. This realization drives us to provide a novel view on prefix tuning, reframing it as the addition of new task-specific experts, thereby inspiring the design of a novel gating mechanism termed Non-linear Residual Gates (NoRGa). Through the incorporation of non-linear activation and residual connection, NoRGa enhances continual learning performance while preserving parameter efficiency. The effectiveness of NoRGa is substantiated both theoretically and empirically across diverse benchmarks and pretraining paradigms. Our code is publicly available at https://github.com/Minhchuyentoancbn/MoE_PromptCL
Related papers
- Key-Value Pair-Free Continual Learner via Task-Specific Prompt-Prototype [28.631643441543574]
Continual learning aims to enable models to acquire new knowledge while retaining previously learned information.<n>We propose a novel approach employing task-specific Prompt-Prototype (ProP)<n>In our method, task-specific prompts facilitate more effective feature learning for the current task, while corresponding prototypes capture the representative features of the input.
arXiv Detail & Related papers (2026-01-08T11:59:35Z) - Retrieval-augmented Prompt Learning for Pre-trained Foundation Models [101.13972024610733]
We present RetroPrompt, which aims to achieve a balance between memorization and generalization.<n>Unlike traditional prompting methods, RetroPrompt incorporates a retrieval mechanism throughout the input, training, and inference stages.<n>We conduct comprehensive experiments on a variety of datasets across natural language processing and computer vision tasks to demonstrate the superior performance of our proposed approach.
arXiv Detail & Related papers (2025-12-23T08:15:34Z) - Rethinking Hebbian Principle: Low-Dimensional Structural Projection for Unsupervised Learning [17.299267108673277]
Hebbian learning is a biological principle that intuitively describes how neurons adapt their connections through repeated stimuli.<n>We introduce the Structural Projection Hebbian Representation (SPHeRe), a novel unsupervised learning method.<n> Experimental results show that SPHeRe achieves SOTA performance among unsupervised synaptic plasticity approaches.
arXiv Detail & Related papers (2025-10-16T15:47:29Z) - One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z) - Fast Thinking for Large Language Models [67.7238685892317]
We introduce Latent Codebooks for Fast Thinking, a framework that uses concise CoT sketches only during training to learn a codebook of discrete strategy priors.<n>At inference, the model conditions on a handful of continuous thinking switches distilled from the codebook in a single pass, enabling strategy-level guidance without producing explicit reasoning tokens.
arXiv Detail & Related papers (2025-09-28T04:19:48Z) - MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper [75.6582687942241]
We propose Mixture of Expert Prompt Tuning (MEPT) as an effective and efficient manifold-mapping framework.<n>MEPT integrates multiple prompt experts to adaptively learn diverse and non-stationary data distributions.<n> Empirical evaluations demonstrate that MEPT outperforms several state-of-the-art parameter efficient baselines on SuperGLUE.
arXiv Detail & Related papers (2025-08-31T21:19:25Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning [36.843950725332476]
Visual Prompt Tuning (VPT) techniques adapt pre-trained Vision Transformers (ViTs) to downstream visual tasks using specialized learnable tokens termed as prompts.
We introduce Long-term Spatial Prompt Tuning (LSPT) - a revolutionary approach to visual representation learning.
Our empirical findings underscore the superiority of LSPT, showcasing its ability to set new benchmarks in visual prompt tuning performance.
arXiv Detail & Related papers (2024-02-27T10:55:07Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Do Compressed LLMs Forget Knowledge? An Experimental Study with
Practical Implications [63.29358103217275]
Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks.
We propose two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after compression.
We introduce a variant called Inference-time Dynamic Prompting (IDP) that can effectively increase prompt diversity without incurring any inference overhead.
arXiv Detail & Related papers (2023-10-02T03:12:06Z) - PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine [24.888093229577965]
We propose a simple, universal, and automatic method named PREFER to address the stated limitations.
Our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin.
arXiv Detail & Related papers (2023-08-23T09:46:37Z) - On the Role of Attention in Prompt-tuning [90.97555030446563]
We study prompt-tuning for one-layer attention architectures and study contextual mixture-models.
We show that softmax-prompt-attention is provably more expressive than softmax-self-attention and linear-prompt-attention.
We also provide experiments that verify our theoretical insights on real datasets and demonstrate how prompt-tuning enables the model to attend to context-relevant information.
arXiv Detail & Related papers (2023-06-06T06:23:38Z) - CODA-Prompt: COntinual Decomposed Attention-based Prompting for
Rehearsal-Free Continual Learning [30.676509834338884]
Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data.
We propose prompting approaches as an alternative to data-rehearsal.
We show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy.
arXiv Detail & Related papers (2022-11-23T18:57:11Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.