Online Hyperparameter Optimization for Class-Incremental Learning
- URL: http://arxiv.org/abs/2301.05032v2
- Date: Wed, 3 May 2023 20:44:36 GMT
- Title: Online Hyperparameter Optimization for Class-Incremental Learning
- Authors: Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun
- Abstract summary: Class-incremental learning (CIL) aims to train a classification model while the number of classes increases phase-by-phase.
An inherent challenge of CIL is the stability-plasticity tradeoff, i.e., CIL models should keep stable to retain old knowledge and keep plastic to absorb new knowledge.
We propose an online learning method that can adaptively optimize the tradeoff without knowing the setting as a priori.
- Score: 99.70569355681174
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Class-incremental learning (CIL) aims to train a classification model while
the number of classes increases phase-by-phase. An inherent challenge of CIL is
the stability-plasticity tradeoff, i.e., CIL models should keep stable to
retain old knowledge and keep plastic to absorb new knowledge. However, none of
the existing CIL models can achieve the optimal tradeoff in different
data-receiving settings--where typically the training-from-half (TFH) setting
needs more stability, but the training-from-scratch (TFS) needs more
plasticity. To this end, we design an online learning method that can
adaptively optimize the tradeoff without knowing the setting as a priori.
Specifically, we first introduce the key hyperparameters that influence the
trade-off, e.g., knowledge distillation (KD) loss weights, learning rates, and
classifier types. Then, we formulate the hyperparameter optimization process as
an online Markov Decision Process (MDP) problem and propose a specific
algorithm to solve it. We apply local estimated rewards and a classic bandit
algorithm Exp3 to address the issues when applying online MDP methods to the
CIL protocol. Our method consistently improves top-performing CIL methods in
both TFH and TFS settings, e.g., boosting the average accuracy of TFH and TFS
by 2.2 percentage points on ImageNet-Full, compared to the state-of-the-art.
Related papers
- Criticality Leveraged Adversarial Training (CLAT) for Boosted Performance via Parameter Efficiency [15.211462468655329]
CLAT introduces parameter efficiency into the adversarial training process, improving both clean accuracy and adversarial robustness.
It can be applied on top of existing adversarial training methods, significantly reducing the number of trainable parameters by approximately 95%.
arXiv Detail & Related papers (2024-08-19T17:58:03Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Knowledge Distillation for Federated Learning: a Practical Guide [8.2791533759453]
Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data.
The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits.
We provide a review of KD-based algorithms tailored for specific FL issues.
arXiv Detail & Related papers (2022-11-09T08:31:23Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - Scalable One-Pass Optimisation of High-Dimensional Weight-Update
Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser.
It requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z) - Few-Shot Lifelong Learning [35.05196800623617]
Few-Shot Lifelong Learning enables deep learning models to perform lifelong/continual learning on few-shot data.
Our method selects very few parameters from the model for training every new set of classes instead of training the full model.
We experimentally show that our method significantly outperforms existing methods on the miniImageNet, CIFAR-100, and CUB-200 datasets.
arXiv Detail & Related papers (2021-03-01T13:26:57Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.