Generalisation Guarantees for Continual Learning with Orthogonal
Gradient Descent
- URL: http://arxiv.org/abs/2006.11942v4
- Date: Fri, 4 Dec 2020 09:23:18 GMT
- Title: Generalisation Guarantees for Continual Learning with Orthogonal
Gradient Descent
- Authors: Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama
- Abstract summary: In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting.
We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime.
- Score: 81.29979864862081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Continual Learning settings, deep neural networks are prone to
Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the
challenge. However, no theoretical guarantees have been proven yet. We present
a theoretical framework to study Continual Learning algorithms in the Neural
Tangent Kernel regime. This framework comprises closed form expression of the
model through tasks and proxies for Transfer Learning, generalisation and tasks
similarity. In this framework, we prove that OGD is robust to Catastrophic
Forgetting then derive the first generalisation bound for SGD and OGD for
Continual Learning. Finally, we study the limits of this framework in practice
for OGD and highlight the importance of the Neural Tangent Kernel variation for
Continual Learning with OGD.
Related papers
- Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective [125.00228936051657]
We introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features.
By fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks.
arXiv Detail & Related papers (2024-07-24T09:30:04Z) - An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network [10.384951432591492]
Recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks.
We show that this infinite-width analysis can be extended to the Jacobian of a deep neural network.
We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
arXiv Detail & Related papers (2023-12-06T09:52:18Z) - Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Connecting NTK and NNGP: A Unified Theoretical Framework for Neural
Network Learning Dynamics in the Kernel Regime [7.136205674624813]
We provide a comprehensive framework for understanding the learning process of deep neural networks in the infinite width limit.
We identify two learning phases characterized by different time scales: gradient-driven and diffusive learning.
arXiv Detail & Related papers (2023-09-08T18:00:01Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Stability & Generalisation of Gradient Descent for Shallow Neural
Networks without the Neural Tangent Kernel [19.4934492061353]
We prove new generalisation and excess risk bounds without the Neural Tangent Kernel (NTK) or Polyak-Lojasiewicz (PL) assumptions.
We show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation.
Unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.
arXiv Detail & Related papers (2021-07-27T10:53:15Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.