RanPAC: Random Projections and Pre-trained Models for Continual Learning
- URL: http://arxiv.org/abs/2307.02251v3
- Date: Tue, 16 Jan 2024 03:38:44 GMT
- Title: RanPAC: Random Projections and Pre-trained Models for Continual Learning
- Authors: Mark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton
van den Hengel
- Abstract summary: Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
- Score: 59.07316955610658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL) aims to incrementally learn different tasks (such as
classification) in a non-stationary data stream without forgetting old ones.
Most CL works focus on tackling catastrophic forgetting under a
learning-from-scratch paradigm. However, with the increasing prominence of
foundation models, pre-trained models equipped with informative representations
have become available for various downstream requirements. Several CL methods
based on pre-trained models have been explored, either utilizing pre-extracted
features directly (which makes bridging distribution gaps challenging) or
incorporating adaptors (which may be subject to forgetting). In this paper, we
propose a concise and effective approach for CL with pre-trained models. Given
that forgetting occurs during parameter updating, we contemplate an alternative
approach that exploits training-free random projectors and class-prototype
accumulation, which thus bypasses the issue. Specifically, we inject a frozen
Random Projection layer with nonlinear activation between the pre-trained
model's feature representations and output head, which captures interactions
between features with expanded dimensionality, providing enhanced linear
separability for class-prototype-based CL. We also demonstrate the importance
of decorrelating the class-prototypes to reduce the distribution disparity when
using pre-trained representations. These techniques prove to be effective and
circumvent the problem of forgetting for both class- and domain-incremental
continual learning. Compared to previous methods applied to pre-trained
ViT-B/16 models, we reduce final error rates by between 20% and 62% on seven
class-incremental benchmarks, despite not using any rehearsal memory. We
conclude that the full potential of pre-trained models for simple, effective,
and fast CL has not hitherto been fully tapped. Code is at
github.com/RanPAC/RanPAC.
Related papers
- StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention [2.66269503676104]
We introduce a novel fine-tuning method, called cross-attention (StochCA), specific to Transformer architectures.
This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning.
Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas.
arXiv Detail & Related papers (2024-02-25T13:53:49Z) - Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models [15.847302755988506]
We address the Continual Learning problem, wherein a model must learn a sequence of tasks from non-stationary distributions.
We propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network.
Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
arXiv Detail & Related papers (2023-12-13T13:11:44Z) - FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained
Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align)
Our method aims to bolster the model's generalizability by preserving the consistency of spurious features.
Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - SLCA: Slow Learner with Classifier Alignment for Continual Learning on a
Pre-trained Model [73.80068155830708]
We present an extensive analysis for continual learning on a pre-trained model (CLPM)
We propose a simple but extremely effective approach named Slow Learner with Alignment (SLCA)
Across a variety of scenarios, our proposal provides substantial improvements for CLPM.
arXiv Detail & Related papers (2023-03-09T08:57:01Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Foundational Models for Continual Learning: An Empirical Study of Latent
Replay [17.322679682451597]
We study the efficacy of pre-trained vision models as a foundation for downstream continual learning scenarios.
We compare efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space.
arXiv Detail & Related papers (2022-04-30T19:11:37Z) - Class-Incremental Learning with Strong Pre-trained Models [97.84755144148535]
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes)
We explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes.
Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv Detail & Related papers (2022-04-07T17:58:07Z) - Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning [141.35105358670316]
We study the difference between a na"ively-trained initial-phase model and the oracle model.
We propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly.
Our CwD is simple to implement and easy to plug into existing methods.
arXiv Detail & Related papers (2021-12-09T07:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.