Related papers: SparCL: Sparse Continual Learning on the Edge

SparCL: Sparse Continual Learning on the Edge

URL: http://arxiv.org/abs/2209.09476v1
Date: Tue, 20 Sep 2022 05:24:48 GMT
Title: SparCL: Sparse Continual Learning on the Edge
Authors: Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
Abstract summary: We propose a novel framework called Sparse Continual Learning(SparCL) to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.
Score: 43.51885725281063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.

Related papers

TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree [52.44403214958304]
In this paper, we introduce TreeLoRA, a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity.<n>To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds.<n> experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach.
arXiv Detail & Related papers (2025-06-12T05:25:35Z)
Parameter Efficient Continual Learning with Dynamic Low-Rank Adaptation [19.48677836920734]
Catastrophic forgetting has remained a critical challenge for deep neural networks in Continual Learning (CL)<n>We introduce PEARL, a rehearsal-free CL framework that entails dynamic rank allocation for LoRA components during CL training.
arXiv Detail & Related papers (2025-05-17T13:19:01Z)
From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients [24.963242232471426]
Continual Learning (CL) aims to learn from a non-stationary data stream where the underlying distribution changes over time.<n>Online CL (onCL) remains dominated by memory-based approaches.<n>We introduce Fine-Grained Hypergradients, an online mechanism for rebalancing gradient updates during training.
arXiv Detail & Related papers (2025-02-26T02:43:54Z)
Efficient Continual Learning with Low Memory Footprint For Edge Device [6.818488262543482]
This paper proposes a compact algorithm called LightCL to overcome the forgetting problem of Continual Learning. We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL. In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $textbf6.16$times$$ memory footprint.
arXiv Detail & Related papers (2024-07-15T08:52:20Z)
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z)
LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms [17.031135153343502]
Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. LifeLearner is a hardware-aware meta learning system that drastically optimize system resources. LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline.
arXiv Detail & Related papers (2023-11-19T20:39:35Z)
A Comprehensive Study of Privacy Risks in Curriculum Learning [25.57099711643689]
Training a machine learning model with data following a meaningful order has been proven to be effective in accelerating the training process. The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification. Yet, how CL affects the privacy of machine learning is unclear.
arXiv Detail & Related papers (2023-10-16T07:06:38Z)
Does Continual Learning Equally Forget All Parameters? [55.431048995662714]
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. We study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. We propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL.
arXiv Detail & Related papers (2023-04-09T04:36:24Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning [1.911678487931003]
We show significant gains (2%-4% absolute) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing upto 5% gains over existing approaches.
arXiv Detail & Related papers (2021-11-18T18:01:57Z)
MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z)
Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss. By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function. Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.