Is forgetting less a good inductive bias for forward transfer?
- URL: http://arxiv.org/abs/2303.08207v1
- Date: Tue, 14 Mar 2023 19:52:09 GMT
- Title: Is forgetting less a good inductive bias for forward transfer?
- Authors: Jiefeng Chen, Timothy Nguyen, Dilan Gorur, Arslan Chaudhry
- Abstract summary: We argue that the measure of forward transfer to a task should not be affected by the restrictions placed on the continual learner.
Instead, forward transfer should be measured by how easy it is to learn a new task given a set of representations produced by continual learning on previous tasks.
Our results indicate that less forgetful representations lead to a better forward transfer suggesting a strong correlation between retaining past information and learning efficiency on new tasks.
- Score: 7.704064306361941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the main motivations of studying continual learning is that the
problem setting allows a model to accrue knowledge from past tasks to learn new
tasks more efficiently. However, recent studies suggest that the key metric
that continual learning algorithms optimize, reduction in catastrophic
forgetting, does not correlate well with the forward transfer of knowledge. We
believe that the conclusion previous works reached is due to the way they
measure forward transfer. We argue that the measure of forward transfer to a
task should not be affected by the restrictions placed on the continual learner
in order to preserve knowledge of previous tasks. Instead, forward transfer
should be measured by how easy it is to learn a new task given a set of
representations produced by continual learning on previous tasks. Under this
notion of forward transfer, we evaluate different continual learning algorithms
on a variety of image classification benchmarks. Our results indicate that less
forgetful representations lead to a better forward transfer suggesting a strong
correlation between retaining past information and learning efficiency on new
tasks. Further, we found less forgetful representations to be more diverse and
discriminative compared to their forgetful counterparts.
Related papers
- Multitask Learning with No Regret: from Improved Confidence Bounds to
Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning.
We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner.
We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z) - Towards Robust and Efficient Continual Language Learning [36.541749819691546]
We construct a new benchmark of task sequences that target different possible transfer scenarios one might face.
We propose a simple, yet effective, learner that satisfies many of our desiderata simply by leveraging a selective strategy for initializing new models from past task checkpoints.
arXiv Detail & Related papers (2023-07-11T19:08:31Z) - Transferability Estimation Based On Principal Gradient Expectation [68.97403769157117]
Cross-task transferability is compatible with transferred results while keeping self-consistency.
Existing transferability metrics are estimated on the particular model by conversing source and target tasks.
We propose Principal Gradient Expectation (PGE), a simple yet effective method for assessing transferability across tasks.
arXiv Detail & Related papers (2022-11-29T15:33:02Z) - Beyond Not-Forgetting: Continual Learning with Backward Knowledge
Transfer [39.99577526417276]
In continual learning (CL) an agent can improve the learning performance of both a new task and old' tasks.
Most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks.
We propose a new CL method with Backward knowlEdge tRansfer (CUBER) for a fixed capacity neural network without data replay.
arXiv Detail & Related papers (2022-11-01T23:55:51Z) - A Theory for Knowledge Transfer in Continual Learning [7.056222499095849]
Continual learning of tasks is an active area in deep neural networks.
Recent work has investigated forward knowledge transfer to new tasks.
We present a theory for knowledge transfer in continual supervised learning.
arXiv Detail & Related papers (2022-08-14T22:28:26Z) - Continual Prompt Tuning for Dialog State Tracking [58.66412648276873]
A desirable dialog system should be able to continually learn new skills without forgetting old ones.
We present Continual Prompt Tuning, a parameter-efficient framework that not only avoids forgetting but also enables knowledge transfer between tasks.
arXiv Detail & Related papers (2022-03-13T13:22:41Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Reducing Representation Drift in Online Continual Learning [87.71558506591937]
We study the online continual learning paradigm, where agents must learn from a changing distribution with constrained memory and compute.
In this work we instead focus on the change in representations of previously observed data due to the introduction of previously unobserved class samples in the incoming data stream.
arXiv Detail & Related papers (2021-04-11T15:19:30Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Representation Ensembling for Synergistic Lifelong Learning with
Quasilinear Complexity [17.858926093389737]
In lifelong learning, data are used to improve performance not only on the current task, but also on previously encountered, and as yet unencountered tasks.
Our key insight is that we can synergistically ensemble representations -- that were learned independently on disparate tasks -- to enable both forward and backward transfer.
arXiv Detail & Related papers (2020-04-27T16:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.