TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML
On-device Learning
- URL: http://arxiv.org/abs/2308.09201v1
- Date: Thu, 17 Aug 2023 22:32:32 GMT
- Title: TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML
On-device Learning
- Authors: Marcus R\"ub, Daniel Maier, Daniel Mueller-Gritschneder, Axel Sikora
- Abstract summary: Training deep neural networks using backpropagation is very memory and computationally intensive.
This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs)
We present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training.
- Score: 0.4747685035960513
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Training deep neural networks using backpropagation is very memory and
computationally intensive. This makes it difficult to run on-device learning or
fine-tune neural networks on tiny, embedded devices such as low-power
micro-controller units (MCUs). Sparse backpropagation algorithms try to reduce
the computational load of on-device learning by training only a subset of the
weights and biases. Existing approaches use a static number of weights to
train. A poor choice of this so-called backpropagation ratio limits either the
computational gain or can lead to severe accuracy losses. In this paper we
present TinyProp, the first sparse backpropagation method that dynamically
adapts the back-propagation ratio during on-device training for each training
step. TinyProp induces a small calculation overhead to sort the elements of the
gradient, which does not significantly impact the computational gains. TinyProp
works particularly well on fine-tuning trained networks on MCUs, which is a
typical use case for embedded applications. For typical datasets from three
datasets MNIST, DCASE2020 and CIFAR10, we are 5 times faster compared to
non-sparse training with an accuracy loss of on average 1%. On average,
TinyProp is 2.9 times faster than existing, static sparse backpropagation
algorithms and the accuracy loss is reduced on average by 6 % compared to a
typical static setting of the back-propagation ratio.
Related papers
- PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems [1.4779899760345436]
We propose a new training method named PRIOT, which optimize the network by pruning selected edges rather than updating weights.
We implement PRIOT and PRIOT-S on the Raspberry Pi Pico and evaluate their accuracy and computational costs.
Our results demonstrate that PRIOT improves accuracy by 8.08 to 33.75 percentage points over existing methods, while PRIOT-S reduces memory footprint with minimal accuracy loss.
arXiv Detail & Related papers (2025-03-21T05:07:57Z) - Efficient Neural Network Training via Subset Pretraining [5.352839075466439]
In training neural networks, it is common practice to use partial gradients computed over batches.
The loss minimum of the training set can be expected to be well-approximated by the minima of its subsets.
experiments have confirmed that results equivalent to conventional training can be reached.
arXiv Detail & Related papers (2024-10-21T21:31:12Z) - Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation [0.4747685035960513]
This study introduces TinyPropv2, an innovative algorithm for optimized on-device learning in deep neural networks.
TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity.
TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases.
arXiv Detail & Related papers (2024-09-11T08:56:13Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Enabling On-Device CNN Training by Self-Supervised Instance Filtering
and Error Map Pruning [17.272561332310303]
This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time.
CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
arXiv Detail & Related papers (2020-07-07T05:52:37Z) - Dithered backprop: A sparse and quantized backpropagation algorithm for
more efficient deep neural network training [18.27946970159625]
We propose a method for reducing the computational cost of backprop, which we named dithered backprop.
We show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits.
arXiv Detail & Related papers (2020-04-09T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.