Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
- URL: http://arxiv.org/abs/2404.02117v1
- Date: Tue, 2 Apr 2024 17:23:22 GMT
- Title: Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
- Authors: Keon-Hee Park, Kyungwoo Song, Gyeong-Moon Park,
- Abstract summary: Few-Shot Class Incremental Learning (FSCIL) is a task that requires a model to learn new classes incrementally without forgetting when only a few samples for each class are given.
FSCIL encounters two significant challenges: catastrophic forgetting and overfitting.
We argue that large models such as vision and language transformers pre-trained on large datasets can be excellent few-shot incremental learners.
- Score: 19.579098962615795
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Few-Shot Class Incremental Learning (FSCIL) is a task that requires a model to learn new classes incrementally without forgetting when only a few samples for each class are given. FSCIL encounters two significant challenges: catastrophic forgetting and overfitting, and these challenges have driven prior studies to primarily rely on shallow models, such as ResNet-18. Even though their limited capacity can mitigate both forgetting and overfitting issues, it leads to inadequate knowledge transfer during few-shot incremental sessions. In this paper, we argue that large models such as vision and language transformers pre-trained on large datasets can be excellent few-shot incremental learners. To this end, we propose a novel FSCIL framework called PriViLege, Pre-trained Vision and Language transformers with prompting functions and knowledge distillation. Our framework effectively addresses the challenges of catastrophic forgetting and overfitting in large models through new pre-trained knowledge tuning (PKT) and two losses: entropy-based divergence loss and semantic knowledge distillation loss. Experimental results show that the proposed PriViLege significantly outperforms the existing state-of-the-art methods with a large margin, e.g., +9.38% in CUB200, +20.58% in CIFAR-100, and +13.36% in miniImageNet. Our implementation code is available at https://github.com/KHU-AGI/PriViLege.
Related papers
- Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers [12.590571371294729]
Few-shot class-incremental learning (FSCIL) aims to adapt the model to new classes from very few data (5 samples) without forgetting the previously learned classes.
Recent works in many-shot CIL (MSCIL) exploited pre-trained models to reduce forgetting and achieve better plasticity.
We use ViT models pre-trained on large-scale datasets for few-shot settings, which face the critical issue of low plasticity.
arXiv Detail & Related papers (2024-04-09T21:12:31Z) - The Emergence of Essential Sparsity in Large Pre-trained Models: The
Weights that Matter [113.35761858962522]
This paper studies induced sparse patterns across multiple large pre-trained vision and language transformers.
We propose the existence of essential sparsity defined with a sharp dropping point beyond which the performance declines much faster.
We also find essential sparsity to hold valid for N:M sparsity patterns as well as on modern-scale large language models.
arXiv Detail & Related papers (2023-06-06T15:49:09Z) - Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z) - Continual Learning with Transformers for Image Classification [12.028617058465333]
In computer vision, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past.
We develop a solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning.
We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model.
arXiv Detail & Related papers (2022-06-28T15:30:10Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - In Defense of the Learning Without Forgetting for Task Incremental
Learning [91.3755431537592]
Catastrophic forgetting is one of the major challenges on the road for continual learning systems.
This paper shows that using the right architecture along with a standard set of augmentations, the results obtained by LwF surpass the latest algorithms for task incremental scenario.
arXiv Detail & Related papers (2021-07-26T16:23:13Z) - Class-incremental Learning with Rectified Feature-Graph Preservation [24.098892115785066]
A central theme of this paper is to learn new classes that arrive in sequential phases over time.
We propose a weighted-Euclidean regularization for old knowledge preservation.
We show how it can work with binary cross-entropy to increase class separation for effective learning of new classes.
arXiv Detail & Related papers (2020-12-15T07:26:04Z) - Few-Shot Class-Incremental Learning [68.75462849428196]
We focus on a challenging but practical few-shot class-incremental learning (FSCIL) problem.
FSCIL requires CNN models to incrementally learn new classes from very few labelled samples, without forgetting the previously learned ones.
We represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes.
arXiv Detail & Related papers (2020-04-23T03:38:33Z) - Novelty-Prepared Few-Shot Classification [24.42397780877619]
We propose to use a novelty-prepared loss function, called self-compacting softmax loss (SSL), for few-shot classification.
In experiments on CUB-200-2011 and mini-ImageNet datasets, we show that SSL leads to significant improvement of the state-of-the-art performance.
arXiv Detail & Related papers (2020-03-01T14:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.