Foundational Models for Continual Learning: An Empirical Study of Latent
Replay
- URL: http://arxiv.org/abs/2205.00329v1
- Date: Sat, 30 Apr 2022 19:11:37 GMT
- Title: Foundational Models for Continual Learning: An Empirical Study of Latent
Replay
- Authors: Oleksiy Ostapenko, Timothee Lesort, Pau Rodr\'iguez, Md Rifat Arefin,
Arthur Douillard, Irina Rish, Laurent Charlin
- Abstract summary: We study the efficacy of pre-trained vision models as a foundation for downstream continual learning scenarios.
We compare efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space.
- Score: 17.322679682451597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rapid development of large-scale pre-training has resulted in foundation
models that can act as effective feature extractors on a variety of downstream
tasks and domains. Motivated by this, we study the efficacy of pre-trained
vision models as a foundation for downstream continual learning (CL) scenarios.
Our goal is twofold. First, we want to understand the compute-accuracy
trade-off between CL in the raw-data space and in the latent space of
pre-trained encoders. Second, we investigate how the characteristics of the
encoder, the pre-training algorithm and data, as well as of the resulting
latent space affect CL performance. For this, we compare the efficacy of
various pre-trained models in large-scale benchmarking scenarios with a vanilla
replay setting applied in the latent and in the raw-data space. Notably, this
study shows how transfer, forgetting, task similarity and learning are
dependent on the input data characteristics and not necessarily on the CL
algorithms. First, we show that under some circumstances reasonable CL
performance can readily be achieved with a non-parametric classifier at
negligible compute. We then show how models pre-trained on broader data result
in better performance for various replay sizes. We explain this with
representational similarity and transfer properties of these representations.
Finally, we show the effectiveness of self-supervised pre-training for
downstream domains that are out-of-distribution as compared to the pre-training
domain. We point out and validate several research directions that can further
increase the efficacy of latent CL including representation ensembling. The
diverse set of datasets used in this study can serve as a compute-efficient
playground for further CL research. The codebase is available under
https://github.com/oleksost/latent_CL.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud
Dataset [25.935496432142976]
It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset.
We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data.
We achieve significant performance gains on a series of downstream perception benchmarks including nuScenes, and KITTI, under different baseline models.
arXiv Detail & Related papers (2023-06-01T12:32:52Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Do Pre-trained Models Benefit Equally in Continual Learning? [25.959813589169176]
Existing work on continual learning (CL) is primarily devoted to developing algorithms for models trained from scratch.
Despite their encouraging performance on contrived benchmarks, these algorithms show dramatic performance drops in real-world scenarios.
This paper advocates the systematic introduction of pre-training to CL.
arXiv Detail & Related papers (2022-10-27T18:03:37Z) - Learning Deep Representations via Contrastive Learning for Instance
Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL)
In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.