Simpler is Better: off-the-shelf Continual Learning Through Pretrained
Backbones
- URL: http://arxiv.org/abs/2205.01586v1
- Date: Tue, 3 May 2022 16:03:46 GMT
- Title: Simpler is Better: off-the-shelf Continual Learning Through Pretrained
Backbones
- Authors: Francesco Pelosin
- Abstract summary: We propose a baseline (off-the-shelf) for Continual Learning of Computer Vision problems.
We exploit the power of pretrained models to compute a class prototype and fill a memory bank.
We compare our pipeline with common CNN models and show the superiority of Vision Transformers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this short paper, we propose a baseline (off-the-shelf) for Continual
Learning of Computer Vision problems, by leveraging the power of pretrained
models. By doing so, we devise a simple approach achieving strong performance
for most of the common benchmarks. Our approach is fast since requires no
parameters updates and has minimal memory requirements (order of KBytes). In
particular, the "training" phase reorders data and exploit the power of
pretrained models to compute a class prototype and fill a memory bank. At
inference time we match the closest prototype through a knn-like approach,
providing us the prediction. We will see how this naive solution can act as an
off-the-shelf continual learning system. In order to better consolidate our
results, we compare the devised pipeline with common CNN models and show the
superiority of Vision Transformers, suggesting that such architectures have the
ability to produce features of higher quality. Moreover, this simple pipeline,
raises the same questions raised by previous works \cite{gdumb} on the
effective progresses made by the CL community especially in the dataset
considered and the usage of pretrained models. Code is live at
https://github.com/francesco-p/off-the-shelf-cl
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models [15.847302755988506]
We address the Continual Learning problem, wherein a model must learn a sequence of tasks from non-stationary distributions.
We propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network.
Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
arXiv Detail & Related papers (2023-12-13T13:11:44Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data.
We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - A Simple Baseline that Questions the Use of Pretrained-Models in
Continual Learning [30.023047201419825]
Some methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning.
We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks.
This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all using the same pretrained transformer model.
arXiv Detail & Related papers (2022-10-10T04:19:53Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.