Continual Pre-Training Mitigates Forgetting in Language and Vision
- URL: http://arxiv.org/abs/2205.09357v1
- Date: Thu, 19 May 2022 07:27:12 GMT
- Title: Continual Pre-Training Mitigates Forgetting in Language and Vision
- Authors: Andrea Cossu, Tinne Tuytelaars, Antonio Carta, Lucia Passaro, Vincenzo
Lomonaco, Davide Bacciu
- Abstract summary: We show that continually pre-trained models are robust against catastrophic forgetting.
We provide empirical evidence supporting the fact that self-supervised pre-training is more effective in retaining previous knowledge than supervised protocols.
- Score: 43.80547864450793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained models are nowadays a fundamental component of machine learning
research. In continual learning, they are commonly used to initialize the model
before training on the stream of non-stationary data. However, pre-training is
rarely applied during continual learning. We formalize and investigate the
characteristics of the continual pre-training scenario in both language and
vision environments, where a model is continually pre-trained on a stream of
incoming data and only later fine-tuned to different downstream tasks. We show
that continually pre-trained models are robust against catastrophic forgetting
and we provide strong empirical evidence supporting the fact that
self-supervised pre-training is more effective in retaining previous knowledge
than supervised protocols. Code is provided at
https://github.com/AndreaCossu/continual-pretraining-nlp-vision .
Related papers
- Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models [29.367678364485794]
We show how to design efficacious data distributions and learning rate schedules for continued pretraining of language models.
We show an improvement of 9% in average model accuracy compared to the baseline of continued training on the pretraining set.
arXiv Detail & Related papers (2024-07-09T22:37:59Z) - An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates [11.90029443742706]
We study the impact of Low-Rank Adaptation (LoRA) rank on the forgetting of the pretraining foundation task and on the plasticity and forgetting of subsequent ones.
We also observe that vision transformers finetuned in that way exhibit a sort of contextual'' forgetting, a behaviour that we do not observe for residual networks.
arXiv Detail & Related papers (2024-05-28T11:29:25Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Lifelong Pretraining: Continually Adapting Language Models to Emerging
Corpora [31.136334214818305]
We study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data.
Over a domain-incremental research paper stream and a chronologically ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms.
Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach.
arXiv Detail & Related papers (2021-10-16T09:59:33Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - Pretrained Language Model Embryology: The Birth of ALBERT [68.5801642674541]
We investigate the developmental process from a set of randomly parameters to a totipotent language model.
Our results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining.
These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
arXiv Detail & Related papers (2020-10-06T05:15:39Z) - A Survey on Self-supervised Pre-training for Sequential Transfer
Learning in Neural Networks [1.1802674324027231]
Self-supervised pre-training for transfer learning is becoming an increasingly popular technique to improve state-of-the-art results using unlabeled data.
We provide an overview of the taxonomy for self-supervised learning and transfer learning, and highlight some prominent methods for designing pre-training tasks across different domains.
arXiv Detail & Related papers (2020-07-01T22:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.