An Empirical Investigation of the Role of Pre-training in Lifelong
Learning
- URL: http://arxiv.org/abs/2112.09153v2
- Date: Tue, 29 Aug 2023 17:04:19 GMT
- Title: An Empirical Investigation of the Role of Pre-training in Lifelong
Learning
- Authors: Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell
- Abstract summary: We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially.
We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
- Score: 21.995593026269578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The lifelong learning paradigm in machine learning is an attractive
alternative to the more prominent isolated learning scheme not only due to its
resemblance to biological learning but also its potential to reduce energy
waste by obviating excessive model re-training. A key challenge to this
paradigm is the phenomenon of catastrophic forgetting. With the increasing
popularity and success of pre-trained models in machine learning, we pose the
question: What role does pre-training play in lifelong learning, specifically
with respect to catastrophic forgetting? We investigate existing methods in the
context of large, pre-trained models and evaluate their performance on a
variety of text and image classification tasks, including a large-scale study
using a novel data set of 15 diverse NLP tasks. Across all settings, we observe
that generic pre-training implicitly alleviates the effects of catastrophic
forgetting when learning multiple tasks sequentially compared to randomly
initialized models. We then further investigate why pre-training alleviates
forgetting in this setting. We study this phenomenon by analyzing the loss
landscape, finding that pre-trained weights appear to ease forgetting by
leading to wider minima. Based on this insight, we propose jointly optimizing
for current task loss and loss basin sharpness to explicitly encourage wider
basins during sequential fine-tuning. We show that this optimization approach
outperforms several state-of-the-art task-sequential continual learning
algorithms across multiple settings, occasionally even without retaining a
memory that scales in size with the number of tasks.
Related papers
- An Efficient Replay for Class-Incremental Learning with Pre-trained Models [0.0]
In class-incremental learning, the steady state among the weight guided by each class center is disrupted, which is significantly correlated with forgetting.
We propose a new method to overcoming forgetting.
arXiv Detail & Related papers (2024-08-15T11:26:28Z) - Task Arithmetic with LoRA for Continual Learning [0.0]
We propose a novel method to continually train vision models using low-rank adaptation and task arithmetic.
When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning.
arXiv Detail & Related papers (2023-11-04T15:12:24Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - Continual Predictive Learning from Videos [100.27176974654559]
We study a new continual learning problem in the context of video prediction.
We propose the continual predictive learning (CPL) approach, which learns a mixture world model via predictive experience replay.
We construct two new benchmarks based on RoboNet and KTH, in which different tasks correspond to different physical robotic environments or human actions.
arXiv Detail & Related papers (2022-04-12T08:32:26Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - Understanding the Role of Training Regimes in Continual Learning [51.32945003239048]
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially.
We study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima.
arXiv Detail & Related papers (2020-06-12T06:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.