Continual Learning in Recurrent Neural Networks
- URL: http://arxiv.org/abs/2006.12109v3
- Date: Wed, 10 Mar 2021 07:47:27 GMT
- Title: Continual Learning in Recurrent Neural Networks
- Authors: Benjamin Ehret, Christian Henning, Maria R. Cervera, Alexander
Meulemans, Johannes von Oswald, Benjamin F. Grewe
- Abstract summary: We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
- Score: 67.05499844830231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While a diverse collection of continual learning (CL) methods has been
proposed to prevent catastrophic forgetting, a thorough investigation of their
effectiveness for processing sequential data with recurrent neural networks
(RNNs) is lacking. Here, we provide the first comprehensive evaluation of
established CL methods on a variety of sequential data benchmarks.
Specifically, we shed light on the particularities that arise when applying
weight-importance methods, such as elastic weight consolidation, to RNNs. In
contrast to feedforward networks, RNNs iteratively reuse a shared set of
weights and require working memory to process input samples. We show that the
performance of weight-importance methods is not directly affected by the length
of the processed sequences, but rather by high working memory requirements,
which lead to an increased need for stability at the cost of decreased
plasticity for learning subsequent tasks. We additionally provide theoretical
arguments supporting this interpretation by studying linear RNNs. Our study
shows that established CL methods can be successfully ported to the recurrent
case, and that a recent regularization approach based on hypernetworks
outperforms weight-importance methods, thus emerging as a promising candidate
for CL in RNNs. Overall, we provide insights on the differences between CL in
feedforward networks and RNNs, while guiding towards effective solutions to
tackle CL on sequential data.
Related papers
- Forget but Recall: Incremental Latent Rectification in Continual Learning [21.600690867361617]
Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs)
Existing Continual Learning approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks.
This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR.
arXiv Detail & Related papers (2024-06-25T08:57:47Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Continual Learning with Dependency Preserving Hypernetworks [14.102057320661427]
An effective approach to address continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network.
We propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency.
In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance.
arXiv Detail & Related papers (2022-09-16T04:42:21Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Continual Learning for Recurrent Neural Networks: a Review and Empirical
Evaluation [12.27992745065497]
Continual Learning with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary.
We organize the literature on CL for sequential data processing by providing a categorization of the contributions and a review of the benchmarks.
We propose two new benchmarks for CL with sequential data based on existing datasets, whose characteristics resemble real-world applications.
arXiv Detail & Related papers (2021-03-12T19:25:28Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - SRDCNN: Strongly Regularized Deep Convolution Neural Network
Architecture for Time-series Sensor Signal Classification Tasks [4.950427992960756]
We present SRDCNN: Strongly Regularized Deep Convolution Neural Network (DCNN) based deep architecture to perform time series classification tasks.
The novelty of the proposed approach is that the network weights are regularized by both L1 and L2 norm penalties.
arXiv Detail & Related papers (2020-07-14T08:42:39Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Continual Learning with Gated Incremental Memories for sequential data
processing [14.657656286730736]
The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions.
This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge.
arXiv Detail & Related papers (2020-04-08T16:00:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.