Learning to Learn without Forgetting using Attention
- URL: http://arxiv.org/abs/2408.03219v2
- Date: Wed, 14 Aug 2024 06:25:50 GMT
- Title: Learning to Learn without Forgetting using Attention
- Authors: Anna Vettoruzzo, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Thorsteinn Rögnvaldsson,
- Abstract summary: Continual learning (CL) refers to the ability to continually learn over time by accommodating new knowledge while retaining previously learned experience.
Current machine learning methods are highly prone to overwrite previously learned patterns and thus forget past experience.
Since hand-crafting effective update mechanisms is difficult, we propose meta-learning a transformer-based to enhance CL.
- Score: 5.6739565497512405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL) refers to the ability to continually learn over time by accommodating new knowledge while retaining previously learned experience. While this concept is inherent in human learning, current machine learning methods are highly prone to overwrite previously learned patterns and thus forget past experience. Instead, model parameters should be updated selectively and carefully, avoiding unnecessary forgetting while optimally leveraging previously learned patterns to accelerate future learning. Since hand-crafting effective update mechanisms is difficult, we propose meta-learning a transformer-based optimizer to enhance CL. This meta-learned optimizer uses attention to learn the complex relationships between model parameters across a stream of tasks, and is designed to generate effective weight updates for the current task while preventing catastrophic forgetting on previously encountered tasks. Evaluations on benchmark datasets like SplitMNIST, RotatedMNIST, and SplitCIFAR-100 affirm the efficacy of the proposed approach in terms of both forward and backward transfer, even on small sets of labeled data, highlighting the advantages of integrating a meta-learned optimizer within the continual learning framework.
Related papers
- Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning [15.475427498268393]
The Train-Attention-Augmented Language Model (TAALM) enhances learning efficiency by dynamically predicting and applying weights to tokens based on their usefulness.
We show that TAALM proves the state-of-the-art performance upon the baselines, and also shows synergistic compatibility when integrated with previous CKL approaches.
arXiv Detail & Related papers (2024-07-24T01:04:34Z) - Learning to Unlearn for Robust Machine Unlearning [6.488418950340473]
We introduce a novel Learning-to-Unlearn (LTU) framework to optimize the unlearning process.
LTU includes a meta-optimization scheme that facilitates models to effectively preserve generalizable knowledge.
We also introduce a Gradient Harmonization strategy to align the optimization trajectories for remembering and forgetting.
arXiv Detail & Related papers (2024-07-15T07:36:00Z) - Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA)
C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge.
Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z) - Large Language Model Can Continue Evolving From Mistakes [36.14056870453356]
Continual Learning (CL) is crucial for keeping Large Language Models (LLMs) up-to-date and addressing their shortcomings.
We propose the Continue Evolving from Mistakes (CEM) method, a data-efficient approach aiming to collect CPT data.
Experiments demonstrate that CEM significantly enhances model performance and continual evolution.
arXiv Detail & Related papers (2024-04-11T17:44:56Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z) - Incremental Object Detection via Meta-Learning [77.55310507917012]
We propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared.
In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection.
arXiv Detail & Related papers (2020-03-17T13:40:00Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.