SparCL: Sparse Continual Learning on the Edge
- URL: http://arxiv.org/abs/2209.09476v1
- Date: Tue, 20 Sep 2022 05:24:48 GMT
- Title: SparCL: Sparse Continual Learning on the Edge
- Authors: Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian,
Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
- Abstract summary: We propose a novel framework called Sparse Continual Learning(SparCL) to enable cost-effective continual learning on edge devices.
SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.
- Score: 43.51885725281063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing work in continual learning (CL) focuses on mitigating catastrophic
forgetting, i.e., model performance deterioration on past tasks when learning a
new task. However, the training efficiency of a CL system is
under-investigated, which limits the real-world application of CL systems under
resource-limited scenarios. In this work, we propose a novel framework called
Sparse Continual Learning(SparCL), which is the first study that leverages
sparsity to enable cost-effective continual learning on edge devices. SparCL
achieves both training acceleration and accuracy preservation through the
synergy of three aspects: weight sparsity, data efficiency, and gradient
sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a
sparse network throughout the entire CL process, dynamic data removal (DDR) to
remove less informative training data, and dynamic gradient masking (DGM) to
sparsify the gradient updates. Each of them not only improves efficiency, but
also further mitigates catastrophic forgetting. SparCL consistently improves
the training efficiency of existing state-of-the-art (SOTA) CL methods by at
most 23X less training FLOPs, and, surprisingly, further improves the SOTA
accuracy by at most 1.7%. SparCL also outperforms competitive baselines
obtained from adapting SOTA sparse training methods to the CL setting in both
efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real
mobile phone, further indicating the practical potential of our method.
Related papers
- Efficient Continual Learning with Low Memory Footprint For Edge Device [6.818488262543482]
This paper proposes a compact algorithm called LightCL to overcome the forgetting problem of Continual Learning.
We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL.
In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $textbf6.16$times$$ memory footprint.
arXiv Detail & Related papers (2024-07-15T08:52:20Z) - Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates.
Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z) - LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded
Computing Platforms [17.031135153343502]
Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context.
LifeLearner is a hardware-aware meta learning system that drastically optimize system resources.
LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline.
arXiv Detail & Related papers (2023-11-19T20:39:35Z) - A Comprehensive Study of Privacy Risks in Curriculum Learning [25.57099711643689]
Training a machine learning model with data following a meaningful order has been proven to be effective in accelerating the training process.
The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification.
Yet, how CL affects the privacy of machine learning is unclear.
arXiv Detail & Related papers (2023-10-16T07:06:38Z) - Does Continual Learning Equally Forget All Parameters? [55.431048995662714]
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks.
We study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL.
We propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL.
arXiv Detail & Related papers (2023-04-09T04:36:24Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss.
By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function.
Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.