Overcoming Catastrophic Forgetting via Direction-Constrained
Optimization
- URL: http://arxiv.org/abs/2011.12581v3
- Date: Fri, 1 Jul 2022 20:11:27 GMT
- Title: Overcoming Catastrophic Forgetting via Direction-Constrained
Optimization
- Authors: Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit
Ram, Lior Horesh
- Abstract summary: We study a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework.
We present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions.
We demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
- Score: 43.53836230865248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies a new design of the optimization algorithm for training
deep learning models with a fixed architecture of the classification network in
a continual learning framework. The training data is non-stationary and the
non-stationarity is imposed by a sequence of distinct tasks. We first analyze a
deep model trained on only one learning task in isolation and identify a region
in network parameter space, where the model performance is close to the
recovered optimum. We provide empirical evidence that this region resembles a
cone that expands along the convergence direction. We study the principal
directions of the trajectory of the optimizer after convergence and show that
traveling along a few top principal directions can quickly bring the parameters
outside the cone but this is not the case for the remaining directions. We
argue that catastrophic forgetting in a continual learning setting can be
alleviated when the parameters are constrained to stay within the intersection
of the plausible cones of individual tasks that were so far encountered during
training. Based on this observation we present our direction-constrained
optimization (DCO) method, where for each task we introduce a linear
autoencoder to approximate its corresponding top forbidden principal
directions. They are then incorporated into the loss function in the form of a
regularization term for the purpose of learning the coming tasks without
forgetting. Furthermore, in order to control the memory growth as the number of
tasks increases, we propose a memory-efficient version of our algorithm called
compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all
autoencoders. We empirically demonstrate that our algorithm performs favorably
compared to other state-of-art regularization-based continual learning methods.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using
Mirror Descent [44.56938629818211]
A fundamental challenge in meta-learning is how to quickly "adapt" the extracted prior in order to train a task-specific model.
Existing approaches deal with this challenge using a preconditioner that enhances convergence of the per-task training process.
The present contribution addresses this limitation by learning a nonlinear mirror map, which induces a versatile distance metric.
arXiv Detail & Related papers (2023-12-20T23:45:06Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - TSO: Curriculum Generation using continuous optimization [0.0]
We present a simple and efficient technique based on continuous optimization.
An encoder network maps/embeds training sequence into continuous space.
A predictor network uses the continuous representation of a strategy as input and predicts the accuracy for fixed network architecture.
arXiv Detail & Related papers (2021-06-16T06:32:21Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Training Networks in Null Space of Feature Covariance for Continual
Learning [34.095874368589904]
We propose a novel network training algorithm called Adam-NSCL, which sequentially optimize network parameters in the null space of previous tasks.
We apply our approach to training networks for continual learning on benchmark datasets of CIFAR-100 and TinyImageNet.
arXiv Detail & Related papers (2021-03-12T07:21:48Z) - Neural Non-Rigid Tracking [26.41847163649205]
We introduce a novel, end-to-end learnable, differentiable non-rigid tracker.
We employ a convolutional neural network to predict dense correspondences and their confidences.
Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance.
arXiv Detail & Related papers (2020-06-23T18:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.