PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems
- URL: http://arxiv.org/abs/2503.16860v1
- Date: Fri, 21 Mar 2025 05:07:57 GMT
- Title: PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems
- Authors: Honoka Anada, Sefutsu Ryu, Masayuki Usui, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki,
- Abstract summary: We propose a new training method named PRIOT, which optimize the network by pruning selected edges rather than updating weights.<n>We implement PRIOT and PRIOT-S on the Raspberry Pi Pico and evaluate their accuracy and computational costs.<n>Our results demonstrate that PRIOT improves accuracy by 8.08 to 33.75 percentage points over existing methods, while PRIOT-S reduces memory footprint with minimal accuracy loss.
- Score: 1.4779899760345436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device transfer learning is crucial for adapting a common backbone model to the unique environment of each edge device. Tiny microcontrollers, such as the Raspberry Pi Pico, are key targets for on-device learning but often lack floating-point units, necessitating integer-only training. Dynamic computation of quantization scale factors, which is adopted in former studies, incurs high computational costs. Therefore, this study focuses on integer-only training with static scale factors, which is challenging with existing training methods. We propose a new training method named PRIOT, which optimizes the network by pruning selected edges rather than updating weights, allowing effective training with static scale factors. The pruning pattern is determined by the edge-popup algorithm, which trains a parameter named score assigned to each edge instead of the original parameters and prunes the edges with low scores before inference. Additionally, we introduce a memory-efficient variant, PRIOT-S, which only assigns scores to a small fraction of edges. We implement PRIOT and PRIOT-S on the Raspberry Pi Pico and evaluate their accuracy and computational costs using a tiny CNN model on the rotated MNIST dataset and the VGG11 model on the rotated CIFAR-10 dataset. Our results demonstrate that PRIOT improves accuracy by 8.08 to 33.75 percentage points over existing methods, while PRIOT-S reduces memory footprint with minimal accuracy loss.
Related papers
- Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation.
Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z) - Learning effective pruning at initialization from iterative pruning [15.842658282636876]
We present an end-to-end neural network-based PaI method to reduce training costs.
Our approach outperforms existing methods in high-sparsity settings.
As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach.
arXiv Detail & Related papers (2024-08-27T03:17:52Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML
On-device Learning [0.4747685035960513]
Training deep neural networks using backpropagation is very memory and computationally intensive.
This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs)
We present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training.
arXiv Detail & Related papers (2023-08-17T22:32:32Z) - Distributed Pruning Towards Tiny Neural Networks in Federated Learning [12.63559789381064]
FedTiny is a distributed pruning framework for federated learning.
It generates specialized tiny models for memory- and computing-constrained devices.
It achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91%.
arXiv Detail & Related papers (2022-12-05T01:58:45Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Enabling On-Device CNN Training by Self-Supervised Instance Filtering
and Error Map Pruning [17.272561332310303]
This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time.
CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
arXiv Detail & Related papers (2020-07-07T05:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.