Continual Learning with Dynamic Sparse Training: Exploring Algorithms
for Effective Model Updates
- URL: http://arxiv.org/abs/2308.14831v2
- Date: Mon, 4 Dec 2023 14:52:08 GMT
- Title: Continual Learning with Dynamic Sparse Training: Exploring Algorithms
for Effective Model Updates
- Authors: Murat Onur Yildirim, Elif Ceren Gok Yildirim, Ghada Sokar, Decebal
Constantin Mocanu, Joaquin Vanschoren
- Abstract summary: Continual learning (CL) refers to the ability of an intelligent system to sequentially acquire and retain knowledge from a stream of data with as little computational overhead as possible.
Dynamic Sparse Training (DST) is a prominent way to find these sparse networks and isolate them for each task.
This paper is the first empirical study investigating the effect of different DST components under the CL paradigm.
- Score: 13.983410740333788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) refers to the ability of an intelligent system to
sequentially acquire and retain knowledge from a stream of data with as little
computational overhead as possible. To this end; regularization, replay,
architecture, and parameter isolation approaches were introduced to the
literature. Parameter isolation using a sparse network which enables to
allocate distinct parts of the neural network to different tasks and also
allows to share of parameters between tasks if they are similar. Dynamic Sparse
Training (DST) is a prominent way to find these sparse networks and isolate
them for each task. This paper is the first empirical study investigating the
effect of different DST components under the CL paradigm to fill a critical
research gap and shed light on the optimal configuration of DST for CL if it
exists. Therefore, we perform a comprehensive study in which we investigate
various DST components to find the best topology per task on well-known
CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our
primary focus is to evaluate the performance of various DST criteria, rather
than the process of mask selection. We found that, at a low sparsity level,
Erdos-R\'enyi Kernel (ERK) initialization utilizes the backbone more
efficiently and allows to effectively learn increments of tasks. At a high
sparsity level, unless it is extreme, uniform initialization demonstrates a
more reliable and robust performance. In terms of growth strategy; performance
is dependent on the defined initialization strategy and the extent of sparsity.
Finally, adaptivity within DST components is a promising way for better
continual learners.
Related papers
- Loop Improvement: An Efficient Approach for Extracting Shared Features from Heterogeneous Data without Central Server [16.249442761713322]
"Loop Improvement" (LI) is a novel method enhancing this separation and feature extraction without necessitating a central server or data interchange among participants.
In personalized federated learning environments, LI consistently outperforms the advanced FedALA algorithm in accuracy across diverse scenarios.
LI's adaptability extends to multi-task learning, streamlining the extraction of common features across tasks and obviating the need for simultaneous training.
arXiv Detail & Related papers (2024-03-21T12:59:24Z) - Finding Foundation Models for Time Series Classification with a PreText
Task [7.197233473373693]
This paper introduces pre-trained domain foundation models for Time Series Classification.
A key aspect of our methodology is a novel pretext task that spans multiple datasets.
Our experiments on the UCR archive demonstrate that this pre-training strategy significantly outperforms the conventional training approach without pre-training.
arXiv Detail & Related papers (2023-11-24T15:03:55Z) - Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse
Training [58.47622737624532]
We study the influence of pruning criteria on Dynamic Sparse Training (DST) performance.
We find that most of the studied methods yield similar results.
The best performance is predominantly given by the simplest technique: magnitude-based pruning.
arXiv Detail & Related papers (2023-06-21T12:43:55Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z) - Continual Learning with Gated Incremental Memories for sequential data
processing [14.657656286730736]
The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions.
This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge.
arXiv Detail & Related papers (2020-04-08T16:00:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.