Related papers: POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

URL: http://arxiv.org/abs/2207.07697v1
Date: Fri, 15 Jul 2022 18:36:29 GMT
Title: POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Authors: Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez
Abstract summary: Fine-tuning models on edge devices would enable privacy-preserving personalization over sensitive data. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices.
Score: 35.397804171588476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poet

Related papers

Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting. This paper proposes a simple but effective edge-friendly incremental learning framework. Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z)
SCoTTi: Save Computation at Training Time with an adaptive framework [7.780766187171572]
On-device training is an emerging approach in machine learning where models are trained on edge devices. We propose SCoTTi (Save Computation at Training Time), an adaptive framework that addresses the challenge of reducing resource consumption during training. Our proposed approach demonstrates superior performance compared to the state-of-the-art methods regarding computational resource savings on various commonly employed benchmarks.
arXiv Detail & Related papers (2023-12-19T16:19:33Z)
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge [27.533985670823945]
TinyTrain is an on-device training approach that drastically reduces training time by selectively updating parts of the model. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy. It achieves 9.5x faster and 3.5x more energy-efficient training over status-quo approaches.
arXiv Detail & Related papers (2023-07-19T13:49:12Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z)
RCT: Resource Constrained Training for Edge AI [35.11160947555767]
Existing training methods for compact models are designed to run on powerful servers with abundant memory and energy budget. We propose Resource Constrained Training (RCT) to mitigate these issues. RCT only keeps a quantised model adjusts throughout the training, so that the memory requirements for model parameters in training is reduced.
arXiv Detail & Related papers (2021-03-26T14:33:31Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network [9.753369031264532]
Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. One of the major obstacles to achieving this goal is the memory limitation of mobile devices. We propose approximating the gradient matrices of deep neural networks using a low-rank parameterization as an avenue to save training memory.
arXiv Detail & Related papers (2020-01-24T05:12:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.