Efficient Continual Learning with Low Memory Footprint For Edge Device
- URL: http://arxiv.org/abs/2407.10545v2
- Date: Wed, 17 Jul 2024 14:34:43 GMT
- Title: Efficient Continual Learning with Low Memory Footprint For Edge Device
- Authors: Zeqing Wang, Fei Cheng, Kangye Ji, Bohu Huang,
- Abstract summary: This paper proposes a compact algorithm called LightCL to overcome the forgetting problem of Continual Learning.
We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL.
In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $textbf6.16$times$$ memory footprint.
- Score: 6.818488262543482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning(CL) is a useful technique to acquire dynamic knowledge continually. Although powerful cloud platforms can fully exert the ability of CL,e.g., customized recommendation systems, similar personalized requirements for edge devices are almost disregarded. This phenomenon stems from the huge resource overhead involved in training neural networks and overcoming the forgetting problem of CL. This paper focuses on these scenarios and proposes a compact algorithm called LightCL. Different from other CL methods bringing huge resource consumption to acquire generalizability among all tasks for delaying forgetting, LightCL compress the resource consumption of already generalized components in neural networks and uses a few extra resources to improve memory in other parts. We first propose two new metrics of learning plasticity and memory stability to seek generalizability during CL. Based on the discovery that lower and middle layers have more generalizability and deeper layers are opposite, we $\textit{Maintain Generalizability}$ by freezing the lower and middle layers. Then, we $\textit{Memorize Feature Patterns}$ to stabilize the feature extracting patterns of previous tasks to improve generalizability in deeper layers. In the experimental comparison, LightCL outperforms other SOTA methods in delaying forgetting and reduces at most $\textbf{6.16$\times$}$ memory footprint, proving the excellent performance of LightCL in efficiency. We also evaluate the efficiency of our method on an edge device, the Jetson Nano, which further proves our method's practical effectiveness.
Related papers
- CLoRA: Parameter-Efficient Continual Learning with Low-Rank Adaptation [14.2843647693986]
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method for class-incremental semantic segmentation.<n>CLoRA significantly reduces the hardware requirements for training, making it well-suited for CL in resource-constrained environments after deployment.
arXiv Detail & Related papers (2025-07-26T09:36:05Z) - DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images [14.448350657613368]
DeeCLIP is a novel framework for detecting AI-generated images.
It incorporates DeeFuser, a fusion module that combines high-level and low-level features.
We trained exclusively on 4-class ProGAN data, DeeCLIP achieves an average accuracy of 89.90%.
arXiv Detail & Related papers (2025-04-28T15:06:28Z) - Cost-Efficient Continual Learning with Sufficient Exemplar Memory [55.77835198580209]
Continual learning (CL) research typically assumes highly constrained exemplar memory resources.
In this work, we investigate CL in a novel setting where exemplar memory is ample.
Our method achieves state-of-the-art performance while reducing the computational cost to a quarter or third of existing methods.
arXiv Detail & Related papers (2025-02-11T05:40:52Z) - Training-free Ultra Small Model for Universal Sparse Reconstruction in Compressed Sensing [39.36305648162564]
This paper proposes ultra-small artificial neural models called coefficients learning (CL)
CL enables training-free and rapid sparse reconstruction while inheriting the generality and interpretability of traditional iterative methods.
Compared to representative iterative methods, CLOMP improves efficiency by 100 to 1000 folds for large-scale data.
arXiv Detail & Related papers (2025-01-20T16:50:59Z) - Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks.
We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z) - Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research.
Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration.
Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z) - FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources [45.40926501138365]
We introduce FastCLIP, a general CLIP training framework built on advanced compositional optimization techniques.
Our framework is equipped with an efficient gradient reduction strategy to reduce communication overhead.
We benchmark the performance of FastCLIP and the state-of-the-art training baseline on different compute scales.
arXiv Detail & Related papers (2024-07-01T16:37:18Z) - Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates.
Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z) - LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded
Computing Platforms [17.031135153343502]
Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context.
LifeLearner is a hardware-aware meta learning system that drastically optimize system resources.
LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline.
arXiv Detail & Related papers (2023-11-19T20:39:35Z) - Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Cost-effective On-device Continual Learning over Memory Hierarchy with
Miro [32.93163587457259]
Miro is a novel system runtime that dynamically configures the CL system based on resource states for the best cost-effectiveness.
Miro significantly outperforms baseline systems we build for comparison, consistently achieving higher cost-effectiveness.
arXiv Detail & Related papers (2023-08-11T10:05:53Z) - Does Continual Learning Equally Forget All Parameters? [55.431048995662714]
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks.
We study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL.
We propose a more efficient and simpler method that entirely removes the every-step replay and replaces them by only $k$-times of FPF periodically triggered during CL.
arXiv Detail & Related papers (2023-04-09T04:36:24Z) - Computationally Budgeted Continual Learning: What Does Matter? [128.0827987414154]
Continual Learning (CL) aims to sequentially train models on streams of incoming data that vary in distribution by preserving previous knowledge while adapting to new data.
Current CL literature focuses on restricted access to previously seen data, while imposing no constraints on the computational budget for training.
We revisit this problem with a large-scale benchmark and analyze the performance of traditional CL approaches in a compute-constrained setting.
arXiv Detail & Related papers (2023-03-20T14:50:27Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - SparCL: Sparse Continual Learning on the Edge [43.51885725281063]
We propose a novel framework called Sparse Continual Learning(SparCL) to enable cost-effective continual learning on edge devices.
SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.
arXiv Detail & Related papers (2022-09-20T05:24:48Z) - TransCL: Transformer Makes Strong and Flexible Compressive Learning [11.613886854794133]
Compressive learning (CL) is an emerging framework that integrates signal acquisition via compressed sensing (CS) and machine learning for inference tasks directly on a small number of measurements.
Previous attempts on CL are not only limited to a fixed CS ratio, but also limited to MNIST/CIFAR-like datasets and do not scale to complex real-world high-resolution (HR) data or vision tasks.
In this paper, a novel transformer-based compressive learning framework on large-scale images with arbitrary CS ratios, dubbed TransCL, is proposed.
arXiv Detail & Related papers (2022-07-25T08:21:48Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Automatic Sparse Connectivity Learning for Neural Networks [4.875787559251317]
Well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources.
In this work, we propose a new automatic pruning method - Sparse Connectivity Learning.
Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
arXiv Detail & Related papers (2022-01-13T15:12:48Z) - Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss.
By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function.
Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.