Related papers: TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning

TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning

URL: http://arxiv.org/abs/2308.09201v1
Date: Thu, 17 Aug 2023 22:32:32 GMT
Title: TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning
Authors: Marcus R\"ub, Daniel Maier, Daniel Mueller-Gritschneder, Axel Sikora
Abstract summary: Training deep neural networks using backpropagation is very memory and computationally intensive. This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs) We present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training.
Score: 0.4747685035960513
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Training deep neural networks using backpropagation is very memory and computationally intensive. This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs). Sparse backpropagation algorithms try to reduce the computational load of on-device learning by training only a subset of the weights and biases. Existing approaches use a static number of weights to train. A poor choice of this so-called backpropagation ratio limits either the computational gain or can lead to severe accuracy losses. In this paper we present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training for each training step. TinyProp induces a small calculation overhead to sort the elements of the gradient, which does not significantly impact the computational gains. TinyProp works particularly well on fine-tuning trained networks on MCUs, which is a typical use case for embedded applications. For typical datasets from three datasets MNIST, DCASE2020 and CIFAR10, we are 5 times faster compared to non-sparse training with an accuracy loss of on average 1%. On average, TinyProp is 2.9 times faster than existing, static sparse backpropagation algorithms and the accuracy loss is reduced on average by 6 % compared to a typical static setting of the back-propagation ratio.

Related papers

PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems [1.4779899760345436]
We propose a new training method named PRIOT, which optimize the network by pruning selected edges rather than updating weights. We implement PRIOT and PRIOT-S on the Raspberry Pi Pico and evaluate their accuracy and computational costs. Our results demonstrate that PRIOT improves accuracy by 8.08 to 33.75 percentage points over existing methods, while PRIOT-S reduces memory footprint with minimal accuracy loss.
arXiv Detail & Related papers (2025-03-21T05:07:57Z)
Efficient Neural Network Training via Subset Pretraining [5.352839075466439]
In training neural networks, it is common practice to use partial gradients computed over batches. The loss minimum of the training set can be expected to be well-approximated by the minima of its subsets. experiments have confirmed that results equivalent to conventional training can be reached.
arXiv Detail & Related papers (2024-10-21T21:31:12Z)
Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation [0.4747685035960513]
This study introduces TinyPropv2, an innovative algorithm for optimized on-device learning in deep neural networks. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity. TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases.
arXiv Detail & Related papers (2024-09-11T08:56:13Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning [17.272561332310303]
This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time. CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
arXiv Detail & Related papers (2020-07-07T05:52:37Z)
Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training [18.27946970159625]
We propose a method for reducing the computational cost of backprop, which we named dithered backprop. We show that our method is fully compatible to state-of-the-art training methods that reduce the bit-precision of training down to 8-bits.
arXiv Detail & Related papers (2020-04-09T17:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.