SketchOGD: Memory-Efficient Continual Learning
- URL: http://arxiv.org/abs/2305.16424v1
- Date: Thu, 25 May 2023 18:56:19 GMT
- Title: SketchOGD: Memory-Efficient Continual Learning
- Authors: Benjamin Wright, Youngjae Min, Jeremy Bernstein, Navid Azizan
- Abstract summary: When machine learning models are trained continually on a sequence of tasks, they are liable to forget what they learned on previous tasks.
This paper proposes a memory-efficient solution to catastrophic forgetting, improving upon an established algorithm known as gradient descent (OGD)
- Score: 9.372686529398859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When machine learning models are trained continually on a sequence of tasks,
they are liable to forget what they learned on previous tasks -- a phenomenon
known as catastrophic forgetting. Proposed solutions to catastrophic forgetting
tend to involve storing information about past tasks, meaning that memory usage
is a chief consideration in determining their practicality. This paper proposes
a memory-efficient solution to catastrophic forgetting, improving upon an
established algorithm known as orthogonal gradient descent (OGD). OGD utilizes
prior model gradients to find weight updates that preserve performance on prior
datapoints. However, since the memory cost of storing prior model gradients
grows with the runtime of the algorithm, OGD is ill-suited to continual
learning over arbitrarily long time horizons. To address this problem, this
paper proposes SketchOGD. SketchOGD employs an online sketching algorithm to
compress model gradients as they are encountered into a matrix of a fixed,
user-determined size. In contrast to existing memory-efficient variants of OGD,
SketchOGD runs online without the need for advance knowledge of the total
number of tasks, is simple to implement, and is more amenable to analysis. We
provide theoretical guarantees on the approximation error of the relevant
sketches under a novel metric suited to the downstream task of OGD.
Experimentally, we find that SketchOGD tends to outperform current
state-of-the-art variants of OGD given a fixed memory budget.
Related papers
- Prompt-Driven Continual Graph Learning [35.58675758528851]
Continual Graph Learning (CGL) aims to accommodate new tasks over evolving graph data without forgetting prior knowledge.
This paper introduces a novel prompt-driven continual graph learning framework, which learns a separate prompt for each incoming task and maintains the underlying graph neural network model fixed.
arXiv Detail & Related papers (2025-02-10T10:28:11Z) - An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks.
Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem.
We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Provable Continual Learning via Sketched Jacobian Approximations [17.381658875470638]
A popular approach to overcome forgetting is to regularize the loss function by penalizing models that perform poorly on previous tasks.
We show that, even under otherwise ideal conditions, it can provably suffer catastrophic forgetting if the diagonal matrix is a poor approximation of the Hessian matrix of previous tasks.
We propose a simple approach to overcome this: Regularizing training of a new task with sketches of the Jacobian matrix of past data.
arXiv Detail & Related papers (2021-12-09T18:36:20Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Gradient Projection Memory for Continual Learning [5.43185002439223]
The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems.
We propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks.
arXiv Detail & Related papers (2021-03-17T16:31:29Z) - BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature
Selection in Sublinear Memory [13.596664481933875]
Current large-scale sketching algorithms show poor memory-accuracy trade-off due to the irreversible collision and accumulation of the noise in the sketched domain.
We develop BEAR, which avoids the extra collisions by storing the second-order gradients in the celebrated Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm.
Experiments on real-world data sets demonstrate that BEAR requires up to three orders of magnitude less memory space to achieve the same classification accuracy compared to the first-order sketching algorithms.
arXiv Detail & Related papers (2020-10-26T18:31:27Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.