A Theoretical Study on Solving Continual Learning
- URL: http://arxiv.org/abs/2211.02633v1
- Date: Fri, 4 Nov 2022 17:45:55 GMT
- Title: A Theoretical Study on Solving Continual Learning
- Authors: Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Zixuan Ke, Bing Liu
- Abstract summary: This study shows that the CIL problem can be decomposed into two sub-problems: Within-task Prediction (WP) and Task-id Prediction (TP)
It further proves that TP is correlated with out-of-distribution (OOD) detection, which connects CIL and OOD detection.
The key conclusion of this study is that regardless of whether WP and TP or OOD detection are defined explicitly or implicitly by a CIL algorithm, good WP and good TP or OOD detection are necessary and sufficient for good CIL performances.
- Score: 13.186315474669287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) learns a sequence of tasks incrementally. There are
two popular CL settings, class incremental learning (CIL) and task incremental
learning (TIL). A major challenge of CL is catastrophic forgetting (CF). While
a number of techniques are already available to effectively overcome CF for
TIL, CIL remains to be highly challenging. So far, little theoretical study has
been done to provide a principled guidance on how to solve the CIL problem.
This paper performs such a study. It first shows that probabilistically, the
CIL problem can be decomposed into two sub-problems: Within-task Prediction
(WP) and Task-id Prediction (TP). It further proves that TP is correlated with
out-of-distribution (OOD) detection, which connects CIL and OOD detection. The
key conclusion of this study is that regardless of whether WP and TP or OOD
detection are defined explicitly or implicitly by a CIL algorithm, good WP and
good TP or OOD detection are necessary and sufficient for good CIL
performances. Additionally, TIL is simply WP. Based on the theoretical result,
new CIL methods are also designed, which outperform strong baselines in both
CIL and TIL settings by a large margin.
Related papers
- Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL)
We take the first step by examining the pre-training dynamics of the emergence of ICL.
We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z) - Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models [63.11967672725459]
We show how P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
We show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
arXiv Detail & Related papers (2024-06-13T17:57:10Z) - Sub-network Discovery and Soft-masking for Continual Learning of Mixed
Tasks [46.96149283885802]
This paper proposes a new CL method to overcome CF and/or limited KT.
It overcomes CF by isolating the knowledge of each task via discovering a subnetwork for it.
A soft-masking mechanism is also proposed to preserve the previous knowledge and to enable the new task to leverage the past knowledge to achieve KT.
arXiv Detail & Related papers (2023-10-13T23:00:39Z) - Class Incremental Learning via Likelihood Ratio Based Task Prediction [20.145128455767587]
An emerging theory-guided approach is to train a task-specific model for each task in a shared network for all tasks.
This paper argues that using a traditional OOD detector for task-id prediction is sub-optimal because additional information can be exploited.
We call the new method TPL (Task-id Prediction based on Likelihood Ratio)
It markedly outperforms strong CIL baselines and has negligible catastrophic forgetting.
arXiv Detail & Related papers (2023-09-26T16:25:57Z) - Learnability and Algorithm for Continual Learning [7.7046692574332285]
Class Incremental Learning (CIL) learns a sequence of tasks consisting of disjoint sets of concepts or classes.
This paper shows that CIL is learnable. Based on the theory, a new CIL algorithm is also proposed.
arXiv Detail & Related papers (2023-06-22T03:08:42Z) - Open-World Continual Learning: Unifying Novelty Detection and Continual
Learning [13.186315474669287]
This paper theoretically proves that OOD detection actually is necessary for CIL.
A good CIL algorithm based on our theory can naturally be used in open world learning.
New CIL methods are also designed, which outperform strong baselines in terms of CIL accuracy and its continual OOD detection by a large margin.
arXiv Detail & Related papers (2023-04-20T01:32:32Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - A Study of Continual Learning Methods for Q-Learning [78.6363825307044]
We present an empirical study on the use of continual learning (CL) methods in a reinforcement learning (RL) scenario.
Our results show that dedicated CL methods can significantly improve learning when compared to the baseline technique of "experience replay"
arXiv Detail & Related papers (2022-06-08T14:51:52Z) - Continual Learning Based on OOD Detection and Task Masking [7.7046692574332285]
This paper proposes a novel unified approach based on out-of-distribution (OOD) detection and task masking, called CLOM, to solve both problems.
Our evaluation shows that CLOM outperforms existing state-of-the-art baselines by large margins.
arXiv Detail & Related papers (2022-03-17T17:10:12Z) - Achieving Forgetting Prevention and Knowledge Transfer in Continual
Learning [22.83874590642864]
Continual learning learns a sequence of tasks with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT)
Most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT.
This paper proposes a novel model called CTR to solve these problems.
arXiv Detail & Related papers (2021-12-05T23:13:13Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.