A Theoretical Analysis of Catastrophic Forgetting through the NTK
Overlap Matrix
- URL: http://arxiv.org/abs/2010.04003v2
- Date: Thu, 25 Feb 2021 15:31:16 GMT
- Title: A Theoretical Analysis of Catastrophic Forgetting through the NTK
Overlap Matrix
- Authors: Thang Doan, Mehdi Bennani, Bogdan Mazoure, Guillaume Rabusseau, Pierre
Alquier
- Abstract summary: We show that the impact of Catastrophic Forgetting increases as two tasks increasingly align.
We propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data.
Experiments support our theoretical findings and show how our method can help reduce CF on classical CL datasets.
- Score: 16.106653541368306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL) is a setting in which an agent has to learn from an
incoming stream of data during its entire lifetime. Although major advances
have been made in the field, one recurring problem which remains unsolved is
that of Catastrophic Forgetting (CF). While the issue has been extensively
studied empirically, little attention has been paid from a theoretical angle.
In this paper, we show that the impact of CF increases as two tasks
increasingly align. We introduce a measure of task similarity called the NTK
overlap matrix which is at the core of CF. We analyze common projected gradient
algorithms and demonstrate how they mitigate forgetting. Then, we propose a
variant of Orthogonal Gradient Descent (OGD) which leverages structure of the
data through Principal Component Analysis (PCA). Experiments support our
theoretical findings and show how our method can help reduce CF on classical CL
datasets.
Related papers
- Revisiting Catastrophic Forgetting in Large Language Model Tuning [79.70722658190097]
Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data.
This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of large language models.
Experiments on three widely-used fine-tuning datasets, spanning different model scales, demonstrate the effectiveness of our method in alleviating CF.
arXiv Detail & Related papers (2024-06-07T11:09:13Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Theory on Forgetting and Generalization of Continual Learning [41.85538120246877]
Continual learning (CL) aims to learn a sequence of tasks.
There is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance.
We show that our results not only explain some interesting empirical observations in recent studies, but also motivate better practical algorithm designs of CL.
arXiv Detail & Related papers (2023-02-12T02:14:14Z) - Distributed Robust Principal Analysis [0.0]
We study the robust principal component analysis problem in a distributed setting.
We propose the first distributed robust principal analysis algorithm based on consensus factorization, dubbed DCF-PCA.
arXiv Detail & Related papers (2022-07-24T05:45:07Z) - Challenging Common Assumptions about Catastrophic Forgetting [13.1202659074346]
We study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence.
We propose a new framework, SCoLe, to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD.
arXiv Detail & Related papers (2022-07-10T21:40:54Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - A Survey on Concept Factorization: From Shallow to Deep Representation
Learning [104.78577405792592]
Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining.
We first re-view the root CF method, and then explore the advancement of CF-based representation learning.
We also introduce the potential application areas of CF-based methods.
arXiv Detail & Related papers (2020-07-31T04:19:14Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Leverage the Average: an Analysis of KL Regularization in RL [44.01222241795292]
We show that Kullback-Leibler (KL) regularization implicitly averages q-values.
We provide a very strong performance bound, the very first to combine two desirable aspects.
Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.
arXiv Detail & Related papers (2020-03-31T10:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.