EsaCL: Efficient Continual Learning of Sparse Models
- URL: http://arxiv.org/abs/2401.05667v1
- Date: Thu, 11 Jan 2024 04:59:44 GMT
- Title: EsaCL: Efficient Continual Learning of Sparse Models
- Authors: Weijieying Ren, Vasant G Honavar
- Abstract summary: Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks.
We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
- Score: 10.227171407348326
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A key challenge in the continual learning setting is to efficiently learn a
sequence of tasks without forgetting how to perform previously learned tasks.
Many existing approaches to this problem work by either retraining the model on
previous tasks or by expanding the model to accommodate new tasks. However,
these approaches typically suffer from increased storage and computational
requirements, a problem that is worsened in the case of sparse models due to
need for expensive re-training after sparsification. To address this challenge,
we propose a new method for efficient continual learning of sparse models
(EsaCL) that can automatically prune redundant parameters without adversely
impacting the model's predictive power, and circumvent the need of retraining.
We conduct a theoretical analysis of loss landscapes with parameter pruning,
and design a directional pruning (SDP) strategy that is informed by the
sharpness of the loss function with respect to the model parameters. SDP
ensures model with minimal loss of predictive accuracy, accelerating the
learning of sparse models at each stage. To accelerate model update, we
introduce an intelligent data selection (IDS) strategy that can identify
critical instances for estimating loss landscape, yielding substantially
improved data efficiency. The results of our experiments show that EsaCL
achieves performance that is competitive with the state-of-the-art methods on
three continual learning benchmarks, while using substantially reduced memory
and computational resources.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Robustness-preserving Lifelong Learning via Dataset Condensation [11.83450966328136]
'catastrophic forgetting' refers to a notorious dilemma between improving model accuracy over new data and retaining accuracy over previous data.
We propose a new memory-replay LL strategy that leverages modern bi-level optimization techniques to determine the 'coreset' of the current data.
We term the resulting LL framework 'Data-Efficient Robustness-Preserving LL' (DERPLL)
Experimental results show that DERPLL outperforms the conventional coreset-guided LL baseline.
arXiv Detail & Related papers (2023-03-07T19:09:03Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Transfer Learning in Deep Learning Models for Building Load Forecasting:
Case of Limited Data [0.0]
This paper proposes a Building-to-Building Transfer Learning framework to overcome the problem and enhance the performance of Deep Learning models.
The proposed approach improved the forecasting accuracy by 56.8% compared to the case of conventional deep learning where training from scratch is used.
arXiv Detail & Related papers (2023-01-25T16:05:47Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.