Reset It and Forget It: Relearning Last-Layer Weights Improves Continual
and Transfer Learning
- URL: http://arxiv.org/abs/2310.07996v1
- Date: Thu, 12 Oct 2023 02:52:14 GMT
- Title: Reset It and Forget It: Relearning Last-Layer Weights Improves Continual
and Transfer Learning
- Authors: Lapo Frati, Neil Traft, Jeff Clune, Nick Cheney
- Abstract summary: This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning.
The repeated resetting of weights in the last layer, which we nickname "zapping," was originally designed for a meta-continual-learning procedure.
We show it is surprisingly applicable in many settings beyond both meta-learning and continual learning.
- Score: 2.4807486426407044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work identifies a simple pre-training mechanism that leads to
representations exhibiting better continual and transfer learning. This
mechanism -- the repeated resetting of weights in the last layer, which we
nickname "zapping" -- was originally designed for a meta-continual-learning
procedure, yet we show it is surprisingly applicable in many settings beyond
both meta-learning and continual learning. In our experiments, we wish to
transfer a pre-trained image classifier to a new set of classes, in a few
shots. We show that our zapping procedure results in improved transfer accuracy
and/or more rapid adaptation in both standard fine-tuning and continual
learning settings, while being simple to implement and computationally
efficient. In many cases, we achieve performance on par with state of the art
meta-learning without needing the expensive higher-order gradients, by using a
combination of zapping and sequential learning. An intuitive explanation for
the effectiveness of this zapping procedure is that representations trained
with repeated zapping learn features that are capable of rapidly adapting to
newly initialized classifiers. Such an approach may be considered a
computationally cheaper type of, or alternative to, meta-learning rapidly
adaptable features with higher-order gradients. This adds to recent work on the
usefulness of resetting neural network parameters during training, and invites
further investigation of this mechanism.
Related papers
- Transformers for Supervised Online Continual Learning [11.270594318662233]
We propose a method that leverages transformers' in-context learning capabilities for online continual learning.
Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.
arXiv Detail & Related papers (2024-03-03T16:12:20Z) - Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation.
Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z) - Continual Learning with Pretrained Backbones by Tuning in the Input
Space [44.97953547553997]
The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks.
We propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters.
arXiv Detail & Related papers (2023-06-05T15:11:59Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Recursive Least-Squares Estimator-Aided Online Learning for Visual
Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training.
It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before.
We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z) - An Empirical Investigation of the Role of Pre-training in Lifelong
Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially.
We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Essentials for Class Incremental Learning [43.306374557919646]
Class-incremental learning results on CIFAR-100 and ImageNet improve over the state-of-the-art by a large margin, while keeping the approach simple.
arXiv Detail & Related papers (2021-02-18T18:01:06Z) - Memory-Efficient Incremental Learning Through Feature Adaptation [71.1449769528535]
We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes.
Keeping the much lower-dimensional feature embeddings of images reduces the memory footprint significantly.
Experimental results show that our method achieves state-of-the-art classification accuracy in incremental learning benchmarks.
arXiv Detail & Related papers (2020-04-01T21:16:05Z) - Incremental Object Detection via Meta-Learning [77.55310507917012]
We propose a meta-learning approach that learns to reshape model gradients, such that information across incremental tasks is optimally shared.
In comparison to existing meta-learning methods, our approach is task-agnostic, allows incremental addition of new-classes and scales to high-capacity models for object detection.
arXiv Detail & Related papers (2020-03-17T13:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.