Transformers for Supervised Online Continual Learning
- URL: http://arxiv.org/abs/2403.01554v1
- Date: Sun, 3 Mar 2024 16:12:20 GMT
- Title: Transformers for Supervised Online Continual Learning
- Authors: Jorg Bornschein, Yazhe Li, Amal Rannen-Triki
- Abstract summary: We propose a method that leverages transformers' in-context learning capabilities for online continual learning.
Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.
- Score: 11.270594318662233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have become the dominant architecture for sequence modeling
tasks such as natural language processing or audio processing, and they are now
even considered for tasks that are not naturally sequential such as image
classification. Their ability to attend to and to process a set of tokens as
context enables them to develop in-context few-shot learning abilities.
However, their potential for online continual learning remains relatively
unexplored. In online continual learning, a model must adapt to a
non-stationary stream of data, minimizing the cumulative nextstep prediction
loss. We focus on the supervised online continual learning setting, where we
learn a predictor $x_t \rightarrow y_t$ for a sequence of examples $(x_t,
y_t)$. Inspired by the in-context learning capabilities of transformers and
their connection to meta-learning, we propose a method that leverages these
strengths for online continual learning. Our approach explicitly conditions a
transformer on recent observations, while at the same time online training it
with stochastic gradient descent, following the procedure introduced with
Transformer-XL. We incorporate replay to maintain the benefits of multi-epoch
training while adhering to the sequential protocol. We hypothesize that this
combination enables fast adaptation through in-context learning and sustained
longterm improvement via parametric learning. Our method demonstrates
significant improvements over previous state-of-the-art results on CLOC, a
challenging large-scale real-world benchmark for image geo-localization.
Related papers
- CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning.
We exploit Variational Autoencoders to learn class-conditioned distributions.
We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z) - Random Representations Outperform Online Continually Learned Representations [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms.
Our method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all online continual learning benchmarks.
Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z) - Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning [2.270857464465579]
This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning.
The repeated resetting of weights in the last layer, which we nickname "zapping," was originally designed for a meta-continual-learning procedure.
We show it is surprisingly applicable in many settings beyond both meta-learning and continual learning.
arXiv Detail & Related papers (2023-10-12T02:52:14Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - Meta-learning the Learning Trends Shared Across Tasks [123.10294801296926]
Gradient-based meta-learning algorithms excel at quick adaptation to new tasks with limited data.
Existing meta-learning approaches only depend on the current task information during the adaptation.
We propose a 'Path-aware' model-agnostic meta-learning approach.
arXiv Detail & Related papers (2020-10-19T08:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.