Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning
- URL: http://arxiv.org/abs/2406.08830v1
- Date: Thu, 13 Jun 2024 05:49:29 GMT
- Title: Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning
- Authors: Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han,
- Abstract summary: Current on-device training methods just focus on efficient training without considering the catastrophic forgetting.
This paper proposes a simple but effective edge-friendly incremental learning framework.
Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
- Score: 88.78080749909665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a direct solution is to involve the existing incremental learning mechanisms into the on-device training framework. Unfortunately, such a manner cannot work well as those mechanisms usually introduce large additional computational cost to the network optimization process, which would inevitably exceed the memory capacity of the edge devices. To address this issue, this paper makes an early effort to propose a simple but effective edge-friendly incremental learning framework. Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting. Upon this finding, we further design a center-sensitive kernel optimization framework to largely alleviate the cost of the gradient computation and back-propagation. Besides, a dynamic channel element selection strategy is also proposed to facilitate a sparse orthogonal gradient projection for further reducing the optimization complexity, upon the knowledge explored from the new task data. Extensive experiments validate our method is efficient and effective, e.g., our method achieves average accuracy boost of 38.08% with even less memory and approximate computation compared to existing on-device training methods, indicating its significant potential for on-device incremental learning.
Related papers
- Kernel Sum of Squares for Data Adapted Kernel Learning of Dynamical Systems from Data: A global optimization approach [0.19999259391104385]
This paper examines the application of the Kernel Sum of Squares (KSOS) method for enhancing kernel learning from data.
Traditional kernel-based methods frequently struggle with selecting optimal base kernels and parameter tuning.
KSOS mitigates these issues by leveraging a global optimization framework with kernel-based surrogate functions.
arXiv Detail & Related papers (2024-08-12T19:32:28Z) - The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection [18.683805940232485]
We introduce a novel method that employs core subset selection for reweighting.
By focusing on a strategically selected coreset, our approach offers a robust representation.
The re-calibrated weights are then mapped back to and propagated across the entire dataset.
arXiv Detail & Related papers (2024-03-18T18:30:22Z) - Compute-Efficient Active Learning [0.0]
Active learning aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset.
Traditional active learning process often demands extensive computational resources, hindering scalability and efficiency.
We present a novel method designed to alleviate the computational burden associated with active learning on massive datasets.
arXiv Detail & Related papers (2024-01-15T12:32:07Z) - Enhancing Kernel Flexibility via Learning Asymmetric Locally-Adaptive
Kernels [35.76925787221499]
This paper introduces the concept of Locally-Adaptive-Bandwidths (LAB) as trainable parameters to enhance the Radial Basis Function (RBF) kernel.
The parameters in LAB RBF kernels are data-dependent, and its number can increase with the dataset.
This paper for the first time establishes an asymmetric kernel ridge regression framework and introduces an iterative kernel learning algorithm.
arXiv Detail & Related papers (2023-10-08T17:08:15Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Privacy-Preserving Serverless Edge Learning with Decentralized Small
Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models.
This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z) - Energy-Efficient Multi-Orchestrator Mobile Edge Learning [54.28419430315478]
Mobile Edge Learning (MEL) is a collaborative learning paradigm that features distributed training of Machine Learning (ML) models over edge devices.
In MEL, possible coexistence of multiple learning tasks with different datasets may arise.
We propose lightweight algorithms that can achieve near-optimal performance and facilitate the trade-offs between energy consumption, accuracy, and solution complexity.
arXiv Detail & Related papers (2021-09-02T07:37:10Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.