A Theoretical Study on Solving Continual Learning
- URL: http://arxiv.org/abs/2211.02633v1
- Date: Fri, 4 Nov 2022 17:45:55 GMT
- Title: A Theoretical Study on Solving Continual Learning
- Authors: Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Zixuan Ke, Bing Liu
- Abstract summary: This study shows that the CIL problem can be decomposed into two sub-problems: Within-task Prediction (WP) and Task-id Prediction (TP)
It further proves that TP is correlated with out-of-distribution (OOD) detection, which connects CIL and OOD detection.
The key conclusion of this study is that regardless of whether WP and TP or OOD detection are defined explicitly or implicitly by a CIL algorithm, good WP and good TP or OOD detection are necessary and sufficient for good CIL performances.
- Score: 13.186315474669287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) learns a sequence of tasks incrementally. There are
two popular CL settings, class incremental learning (CIL) and task incremental
learning (TIL). A major challenge of CL is catastrophic forgetting (CF). While
a number of techniques are already available to effectively overcome CF for
TIL, CIL remains to be highly challenging. So far, little theoretical study has
been done to provide a principled guidance on how to solve the CIL problem.
This paper performs such a study. It first shows that probabilistically, the
CIL problem can be decomposed into two sub-problems: Within-task Prediction
(WP) and Task-id Prediction (TP). It further proves that TP is correlated with
out-of-distribution (OOD) detection, which connects CIL and OOD detection. The
key conclusion of this study is that regardless of whether WP and TP or OOD
detection are defined explicitly or implicitly by a CIL algorithm, good WP and
good TP or OOD detection are necessary and sufficient for good CIL
performances. Additionally, TIL is simply WP. Based on the theoretical result,
new CIL methods are also designed, which outperform strong baselines in both
CIL and TIL settings by a large margin.
Related papers
- Accurate Forgetting for Heterogeneous Federated Continual Learning [89.08735771893608]
We propose a new concept accurate forgetting (AF) and develop a novel generative-replay methodMethodwhich selectively utilizes previous knowledge in federated networks.
We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge.
arXiv Detail & Related papers (2025-02-20T02:35:17Z) - ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially.
Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks.
However, such methods lack theoretical guarantees, making them prone to unexpected failures.
We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z) - Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL)
We take the first step by examining the pre-training dynamics of the emergence of ICL.
We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z) - Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models [63.11967672725459]
We show how P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
We show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.
arXiv Detail & Related papers (2024-06-13T17:57:10Z) - Sub-network Discovery and Soft-masking for Continual Learning of Mixed
Tasks [46.96149283885802]
This paper proposes a new CL method to overcome CF and/or limited KT.
It overcomes CF by isolating the knowledge of each task via discovering a subnetwork for it.
A soft-masking mechanism is also proposed to preserve the previous knowledge and to enable the new task to leverage the past knowledge to achieve KT.
arXiv Detail & Related papers (2023-10-13T23:00:39Z) - Class Incremental Learning via Likelihood Ratio Based Task Prediction [20.145128455767587]
An emerging theory-guided approach is to train a task-specific model for each task in a shared network for all tasks.
This paper argues that using a traditional OOD detector for task-id prediction is sub-optimal because additional information can be exploited.
We call the new method TPL (Task-id Prediction based on Likelihood Ratio)
It markedly outperforms strong CIL baselines and has negligible catastrophic forgetting.
arXiv Detail & Related papers (2023-09-26T16:25:57Z) - Learnability and Algorithm for Continual Learning [7.7046692574332285]
Class Incremental Learning (CIL) learns a sequence of tasks consisting of disjoint sets of concepts or classes.
This paper shows that CIL is learnable. Based on the theory, a new CIL algorithm is also proposed.
arXiv Detail & Related papers (2023-06-22T03:08:42Z) - Open-World Continual Learning: Unifying Novelty Detection and Continual Learning [20.789113765332935]
We show that good OOD detection for each task within the set of learned tasks is necessary for successful CIL.
We then prove that the theory can be generalized or extended to open-world CIL, which can perform CIL in the open world and detect future or open-world OOD data.
New CIL methods are also designed, which outperform strong baselines in CIL accuracy and in continual OOD detection by a large margin.
arXiv Detail & Related papers (2023-04-20T01:32:32Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - Continual Learning Based on OOD Detection and Task Masking [7.7046692574332285]
This paper proposes a novel unified approach based on out-of-distribution (OOD) detection and task masking, called CLOM, to solve both problems.
Our evaluation shows that CLOM outperforms existing state-of-the-art baselines by large margins.
arXiv Detail & Related papers (2022-03-17T17:10:12Z) - Achieving Forgetting Prevention and Knowledge Transfer in Continual
Learning [22.83874590642864]
Continual learning learns a sequence of tasks with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT)
Most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT.
This paper proposes a novel model called CTR to solve these problems.
arXiv Detail & Related papers (2021-12-05T23:13:13Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.