Recasting Continual Learning as Sequence Modeling
- URL: http://arxiv.org/abs/2310.11952v2
- Date: Sun, 14 Jan 2024 13:22:30 GMT
- Title: Recasting Continual Learning as Sequence Modeling
- Authors: Soochan Lee, Jaehyeon Son, Gunhee Kim
- Abstract summary: We propose to formulate continual learning as a sequence modeling problem.
By adopting the meta-continual learning (MCL) framework, we can train the sequence model at the meta-level.
Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.
- Score: 44.437160324905726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we aim to establish a strong connection between two significant
bodies of machine learning research: continual learning and sequence modeling.
That is, we propose to formulate continual learning as a sequence modeling
problem, allowing advanced sequence models to be utilized for continual
learning. Under this formulation, the continual learning process becomes the
forward pass of a sequence model. By adopting the meta-continual learning (MCL)
framework, we can train the sequence model at the meta-level, on multiple
continual learning episodes. As a specific example of our new formulation, we
demonstrate the application of Transformers and their efficient variants as MCL
methods. Our experiments on seven benchmarks, covering both classification and
regression, show that sequence models can be an attractive solution for general
MCL.
Related papers
- Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
We introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective.
Our experimental results show that our models outperform state-of-the-art SSMs on standard sequence modeling benchmarks and language modeling tasks.
arXiv Detail & Related papers (2024-07-19T11:12:08Z) - Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation [48.071162716120334]
We study how the multimodal nature of the input affects the learning dynamics of a model.
Motivated by this observation, we propose a modality-aware feature distillation (MAFED) approach.
arXiv Detail & Related papers (2024-06-27T16:12:57Z) - State Soup: In-Context Skill Learning, Retrieval and Mixing [22.485700977542127]
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems.
Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter.
Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined.
arXiv Detail & Related papers (2024-06-12T17:06:07Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Continual Instruction Tuning for Large Multimodal Models [30.438442723421556]
Multi-task joint instruction tuning can facilitate the model's continual learning ability and forgetting.
We propose task-similarity-informed regularization and model expansion methods for continual instruction tuning of LMMs.
arXiv Detail & Related papers (2023-11-27T15:04:48Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications.
Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.