An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning
- URL: http://arxiv.org/abs/2308.11677v2
- Date: Wed, 27 Sep 2023 14:54:05 GMT
- Title: An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning
- Authors: Gr\'egoire Petit, Michael Soumm, Eva Feillet, Adrian Popescu, Bertrand
Delezoide, David Picard, C\'eline Hudelot
- Abstract summary: Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
- Score: 36.619804184427245
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Class-Incremental Learning (CIL) aims to build classification models from
data streams. At each step of the CIL process, new classes must be integrated
into the model. Due to catastrophic forgetting, CIL is particularly challenging
when examples from past classes cannot be stored, the case on which we focus
here. To date, most approaches are based exclusively on the target dataset of
the CIL process. However, the use of models pre-trained in a self-supervised
way on large amounts of data has recently gained momentum. The initial model of
the CIL process may only use the first batch of the target dataset, or also use
pre-trained weights obtained on an auxiliary dataset. The choice between these
two initial learning strategies can significantly influence the performance of
the incremental learning model, but has not yet been studied in depth.
Performance is also influenced by the choice of the CIL algorithm, the neural
architecture, the nature of the target task, the distribution of classes in the
stream and the number of examples available for learning. We conduct a
comprehensive experimental study to assess the roles of these factors. We
present a statistical analysis framework that quantifies the relative
contribution of each factor to incremental performance. Our main finding is
that the initial training strategy is the dominant factor influencing the
average incremental accuracy, but that the choice of CIL algorithm is more
important in preventing forgetting. Based on this analysis, we propose
practical recommendations for choosing the right initial training strategy for
a given incremental learning use case. These recommendations are intended to
facilitate the practical deployment of incremental learning.
Related papers
- In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models [37.45103473809928]
We propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model.
By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data.
arXiv Detail & Related papers (2024-08-07T05:48:05Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Landscape-Aware Growing: The Power of a Little LAG [49.897766925371485]
We study the question of how to select the best growing strategy from a given pool of growing strategies.
We present an alternative perspective based on early training dynamics, which we call "landscape-aware growing (LAG)"
arXiv Detail & Related papers (2024-06-04T16:38:57Z) - Few-Shot Class-Incremental Learning with Prior Knowledge [94.95569068211195]
We propose Learning with Prior Knowledge (LwPK) to enhance the generalization ability of the pre-trained model.
Experimental results indicate that LwPK effectively enhances the model resilience against catastrophic forgetting.
arXiv Detail & Related papers (2024-02-02T08:05:35Z) - Strategies and impact of learning curve estimation for CNN-based image
classification [0.2678472239880052]
Learning curves are a measure for how the performance of machine learning models improves given a certain volume of training data.
Over a wide variety of applications and models it was observed that learning curves follow -- to a large extent -- a power law behavior.
By estimating the learning curve of a model from training on small subsets of data only the best models need to be considered for training on the full dataset.
arXiv Detail & Related papers (2023-10-12T16:28:25Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating
True Coverage [3.4806267677524896]
We propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL)
It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation.
We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases.
arXiv Detail & Related papers (2023-06-07T01:43:47Z) - SPEC: Summary Preference Decomposition for Low-Resource Abstractive
Summarization [21.037841262371355]
We present a framework to transfer few-shot learning processes from source corpora to the target corpus.
Our methods achieve state-of-the-art performance on six diverse corpora with 30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on ROUGE-1/2/L under 10- and 100-example settings.
arXiv Detail & Related papers (2023-03-24T14:07:03Z) - Class-Incremental Learning with Strong Pre-trained Models [97.84755144148535]
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes)
We explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes.
Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv Detail & Related papers (2022-04-07T17:58:07Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z) - Few-Shot Incremental Learning with Continually Evolved Classifiers [46.278573301326276]
Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points.
The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbate the notorious catastrophic forgetting problems.
We propose a Continually Evolved CIF ( CEC) that employs a graph model to propagate context information between classifiers for adaptation.
arXiv Detail & Related papers (2021-04-07T10:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.