Continual Learning with Gated Incremental Memories for sequential data
processing
- URL: http://arxiv.org/abs/2004.04077v1
- Date: Wed, 8 Apr 2020 16:00:20 GMT
- Title: Continual Learning with Gated Incremental Memories for sequential data
processing
- Authors: Andrea Cossu, Antonio Carta, Davide Bacciu
- Abstract summary: The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions.
This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge.
- Score: 14.657656286730736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to learn in dynamic, nonstationary environments without
forgetting previous knowledge, also known as Continual Learning (CL), is a key
enabler for scalable and trustworthy deployments of adaptive solutions. While
the importance of continual learning is largely acknowledged in machine vision
and reinforcement learning problems, this is mostly under-documented for
sequence processing tasks. This work proposes a Recurrent Neural Network (RNN)
model for CL that is able to deal with concept drift in input distribution
without forgetting previously acquired knowledge. We also implement and test a
popular CL approach, Elastic Weight Consolidation (EWC), on top of two
different types of RNNs. Finally, we compare the performances of our enhanced
architecture against EWC and RNNs on a set of standard CL benchmarks, adapted
to the sequential data processing scenario. Results show the superior
performance of our architecture and highlight the need for special solutions
designed to address CL in RNNs.
Related papers
- Slowing Down Forgetting in Continual Learning [20.57872238271025]
A common challenge in continual learning (CL) is forgetting, where the performance on old tasks drops after new, additional tasks are learned.
We propose a novel framework called ReCL to slow down forgetting in CL.
arXiv Detail & Related papers (2024-11-11T12:19:28Z) - Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based
Action Recognition [6.14431765787048]
Continual learning (CL) aims to build machine learning models that can accumulate knowledge continuously over different tasks without retraining from scratch.
Previous studies have shown that pre-training graph neural networks (GNN) may lead to negative transfer after fine-tuning.
We propose the first continual graph learning benchmark for continual graph learning setting.
arXiv Detail & Related papers (2024-01-31T18:20:42Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Learning Deep Representations via Contrastive Learning for Instance
Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL)
In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Continual Learning for Recurrent Neural Networks: a Review and Empirical
Evaluation [12.27992745065497]
Continual Learning with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary.
We organize the literature on CL for sequential data processing by providing a categorization of the contributions and a review of the benchmarks.
We propose two new benchmarks for CL with sequential data based on existing datasets, whose characteristics resemble real-world applications.
arXiv Detail & Related papers (2021-03-12T19:25:28Z) - Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data.
The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.