Training Networks in Null Space of Feature Covariance for Continual
Learning
- URL: http://arxiv.org/abs/2103.07113v2
- Date: Tue, 16 Mar 2021 07:43:15 GMT
- Title: Training Networks in Null Space of Feature Covariance for Continual
Learning
- Authors: Shipeng Wang, Xiaorong Li, Jian Sun, Zongben Xu
- Abstract summary: We propose a novel network training algorithm called Adam-NSCL, which sequentially optimize network parameters in the null space of previous tasks.
We apply our approach to training networks for continual learning on benchmark datasets of CIFAR-100 and TinyImageNet.
- Score: 34.095874368589904
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the setting of continual learning, a network is trained on a sequence of
tasks, and suffers from catastrophic forgetting. To balance plasticity and
stability of network in continual learning, in this paper, we propose a novel
network training algorithm called Adam-NSCL, which sequentially optimizes
network parameters in the null space of previous tasks. We first propose two
mathematical conditions respectively for achieving network stability and
plasticity in continual learning. Based on them, the network training for
sequential tasks can be simply achieved by projecting the candidate parameter
update into the approximate null space of all previous tasks in the network
training process, where the candidate parameter update can be generated by
Adam. The approximate null space can be derived by applying singular value
decomposition to the uncentered covariance matrix of all input features of
previous tasks for each linear layer. For efficiency, the uncentered covariance
matrix can be incrementally computed after learning each task. We also
empirically verify the rationality of the approximate null space at each linear
layer. We apply our approach to training networks for continual learning on
benchmark datasets of CIFAR-100 and TinyImageNet, and the results suggest that
the proposed approach outperforms or matches the state-ot-the-art continual
learning approaches.
Related papers
- Learning a Low-Rank Feature Representation: Achieving Better Trade-Off
between Stability and Plasticity in Continual Learning [20.15493383736196]
In continual learning, networks confront a trade-off between stability and plasticity when trained on a sequence of tasks.
We propose a novel training algorithm called LRFR to bolster plasticity without sacrificing stability.
Using CIFAR-100 and TinyImageNet as benchmark datasets for continual learning, the proposed approach consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-14T08:34:11Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Continual Learning with Dependency Preserving Hypernetworks [14.102057320661427]
An effective approach to address continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network.
We propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency.
In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance.
arXiv Detail & Related papers (2022-09-16T04:42:21Z) - Continual Learning with Guarantees via Weight Interval Constraints [18.791232422083265]
We introduce a new training paradigm that enforces interval constraints on neural network parameter space to control forgetting.
We show how to put bounds on forgetting by reformulating continual learning of a model as a continual contraction of its parameter space.
arXiv Detail & Related papers (2022-06-16T08:28:37Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks.
We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI)
In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z) - Overcoming Catastrophic Forgetting via Direction-Constrained
Optimization [43.53836230865248]
We study a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework.
We present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions.
We demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
arXiv Detail & Related papers (2020-11-25T08:45:21Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Side-Tuning: A Baseline for Network Adaptation via Additive Side
Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network.
In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.