An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning
- URL: http://arxiv.org/abs/2308.11677v2
- Date: Wed, 27 Sep 2023 14:54:05 GMT
- Title: An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning
- Authors: Gr\'egoire Petit, Michael Soumm, Eva Feillet, Adrian Popescu, Bertrand
Delezoide, David Picard, C\'eline Hudelot
- Abstract summary: Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
- Score: 36.619804184427245
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Class-Incremental Learning (CIL) aims to build classification models from
data streams. At each step of the CIL process, new classes must be integrated
into the model. Due to catastrophic forgetting, CIL is particularly challenging
when examples from past classes cannot be stored, the case on which we focus
here. To date, most approaches are based exclusively on the target dataset of
the CIL process. However, the use of models pre-trained in a self-supervised
way on large amounts of data has recently gained momentum. The initial model of
the CIL process may only use the first batch of the target dataset, or also use
pre-trained weights obtained on an auxiliary dataset. The choice between these
two initial learning strategies can significantly influence the performance of
the incremental learning model, but has not yet been studied in depth.
Performance is also influenced by the choice of the CIL algorithm, the neural
architecture, the nature of the target task, the distribution of classes in the
stream and the number of examples available for learning. We conduct a
comprehensive experimental study to assess the roles of these factors. We
present a statistical analysis framework that quantifies the relative
contribution of each factor to incremental performance. Our main finding is
that the initial training strategy is the dominant factor influencing the
average incremental accuracy, but that the choice of CIL algorithm is more
important in preventing forgetting. Based on this analysis, we propose
practical recommendations for choosing the right initial training strategy for
a given incremental learning use case. These recommendations are intended to
facilitate the practical deployment of incremental learning.
Related papers
- Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Class Balance Matters to Active Class-Incremental Learning [61.11786214164405]
We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning.
We propose Class-Balanced Selection (CBS) strategy to achieve both class balance and informativeness in chosen samples.
Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique.
arXiv Detail & Related papers (2024-12-09T16:37:27Z) - The role of data-induced randomness in quantum machine learning classification tasks [0.0]
We introduce a metric for binary classification tasks, the class margin, by merging the concepts of average randomness and classification margin.
This metric analytically connects data-induced randomness with classification accuracy for a given data-embedding map.
We benchmark a range of data-embedding strategies through class margin, demonstrating that data-induced randomness imposes a limit on classification performance.
arXiv Detail & Related papers (2024-11-28T17:26:35Z) - In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models [37.45103473809928]
We propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model.
By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data.
arXiv Detail & Related papers (2024-08-07T05:48:05Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Strategies and impact of learning curve estimation for CNN-based image
classification [0.2678472239880052]
Learning curves are a measure for how the performance of machine learning models improves given a certain volume of training data.
Over a wide variety of applications and models it was observed that learning curves follow -- to a large extent -- a power law behavior.
By estimating the learning curve of a model from training on small subsets of data only the best models need to be considered for training on the full dataset.
arXiv Detail & Related papers (2023-10-12T16:28:25Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating
True Coverage [3.4806267677524896]
We propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL)
It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation.
We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases.
arXiv Detail & Related papers (2023-06-07T01:43:47Z) - Class-Incremental Learning with Strong Pre-trained Models [97.84755144148535]
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes)
We explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes.
Our proposed method is robust and generalizes to all analyzed CIL settings.
arXiv Detail & Related papers (2022-04-07T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.