Data-dependent and Oracle Bounds on Forgetting in Continual Learning
- URL: http://arxiv.org/abs/2406.09370v2
- Date: Thu, 23 Jan 2025 20:23:14 GMT
- Title: Data-dependent and Oracle Bounds on Forgetting in Continual Learning
- Authors: Lior Friedman, Ron Meir,
- Abstract summary: In continual learning, knowledge must be preserved and re-used between tasks.
We provide both data-dependent and oracle upper bounds that apply regardless of model and algorithm choice.
We derive an algorithm based on our bounds and demonstrate empirically that our approach yields tight bounds on forgetting for several continual learning problems.
- Score: 7.903539618132858
- License:
- Abstract: In continual learning, knowledge must be preserved and re-used between tasks, maintaining good transfer to future tasks and minimizing forgetting of previously learned ones. While several practical algorithms have been devised for this setting, there have been few theoretical works aiming to quantify and bound the degree of Forgetting in general settings. We provide both data-dependent and oracle upper bounds that apply regardless of model and algorithm choice, as well as bounds for Gibbs posteriors. We derive an algorithm based on our bounds and demonstrate empirically that our approach yields tight bounds on forgetting for several continual learning problems.
Related papers
- Generalization Bounds for Dependent Data using Online-to-Batch Conversion [0.6144680854063935]
We upper bound the generalization error of batch learning algorithms trained on samples drawn from a mixing process.
Our work does not require any stability assumptions on the batch learner.
We prove that the EWA algorithm, a textbook family of online learning algorithms, satisfies our new notion of stability.
arXiv Detail & Related papers (2024-05-22T14:07:25Z) - Near-Optimal Solutions of Constrained Learning Problems [85.48853063302764]
In machine learning systems, the need to curtail their behavior has become increasingly apparent.
This is evidenced by recent advancements towards developing models that satisfy dual robustness variables.
Our results show that rich parametrizations effectively mitigate non-dimensional, finite learning problems.
arXiv Detail & Related papers (2024-03-18T14:55:45Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making [73.48977854003697]
This work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space.
We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions.
arXiv Detail & Related papers (2023-02-10T18:45:52Z) - Continual Learning with Distributed Optimization: Does CoCoA Forget? [0.0]
We focus on the continual learning problem where the tasks arrive sequentially.
The aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks.
We consider the well-established distributed learning algorithm COCOA.
arXiv Detail & Related papers (2022-11-30T13:49:43Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - PAC-Bayesian Lifelong Learning For Multi-Armed Bandits [38.76324445090305]
We present a PAC-Bayesian analysis of lifelong learning.
We consider the case when each learning task is a multi-armed bandit problem.
We propose lifelong learning algorithms that use our new bounds as learning objectives.
arXiv Detail & Related papers (2022-03-07T11:23:12Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Metalearning Linear Bandits by Prior Update [7.519872646378836]
Fully Bayesian approaches assume that problem parameters are generated from a known prior, while in practice, such information is often lacking.
This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance.
In this work we prove, in the context of linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of an algorithm that uses the true prior.
arXiv Detail & Related papers (2021-07-12T11:17:01Z) - A Constraint-Based Algorithm for the Structural Learning of
Continuous-Time Bayesian Networks [70.88503833248159]
We propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks.
We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence.
arXiv Detail & Related papers (2020-07-07T07:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.