Related papers: Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

URL: http://arxiv.org/abs/2011.12581v3
Date: Fri, 1 Jul 2022 20:11:27 GMT
Title: Overcoming Catastrophic Forgetting via Direction-Constrained Optimization
Authors: Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit Ram, Lior Horesh
Abstract summary: We study a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. We present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. We demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
Score: 43.53836230865248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.

Related papers

Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using Mirror Descent [44.56938629818211]
A fundamental challenge in meta-learning is how to quickly "adapt" the extracted prior in order to train a task-specific model. Existing approaches deal with this challenge using a preconditioner that enhances convergence of the per-task training process. The present contribution addresses this limitation by learning a nonlinear mirror map, which induces a versatile distance metric.
arXiv Detail & Related papers (2023-12-20T23:45:06Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA) Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS) Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z)
TSO: Curriculum Generation using continuous optimization [0.0]
We present a simple and efficient technique based on continuous optimization. An encoder network maps/embeds training sequence into continuous space. A predictor network uses the continuous representation of a strategy as input and predicts the accuracy for fixed network architecture.
arXiv Detail & Related papers (2021-06-16T06:32:21Z)
An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z)
Training Networks in Null Space of Feature Covariance for Continual Learning [34.095874368589904]
We propose a novel network training algorithm called Adam-NSCL, which sequentially optimize network parameters in the null space of previous tasks. We apply our approach to training networks for continual learning on benchmark datasets of CIFAR-100 and TinyImageNet.
arXiv Detail & Related papers (2021-03-12T07:21:48Z)
Neural Non-Rigid Tracking [26.41847163649205]
We introduce a novel, end-to-end learnable, differentiable non-rigid tracker. We employ a convolutional neural network to predict dense correspondences and their confidences. Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance.
arXiv Detail & Related papers (2020-06-23T18:00:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.