Related papers: Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation

Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation

URL: http://arxiv.org/abs/2409.07109v1
Date: Wed, 11 Sep 2024 08:56:13 GMT
Title: Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation
Authors: Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder,
Abstract summary: This study introduces TinyPropv2, an innovative algorithm for optimized on-device learning in deep neural networks. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity. TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases.
Score: 0.4747685035960513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study introduces TinyPropv2, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity, including the ability to selectively skip training steps. This feature significantly lowers computational effort without substantially compromising accuracy. Our comprehensive evaluation across diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR, and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases. For instance, against full training, TinyPropv2's accuracy drop is minimal, for example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms of computational effort, TinyPropv2 shows a marked reduction, requiring as little as 10 percent of the computational effort needed for full training in some scenarios, and consistently outperforms other sparse training methodologies. These findings underscore TinyPropv2's capacity to efficiently manage computational resources while maintaining high accuracy, positioning it as an advantageous solution for advanced embedded device applications in the IoT ecosystem.

Related papers

Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning [0.4747685035960513]
Training deep neural networks using backpropagation is very memory and computationally intensive. This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs) We present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training.
arXiv Detail & Related papers (2023-08-17T22:32:32Z)
AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks [2.6742343015805083]
We propose Gradient Annealing (GA) to explore the non-uniform distribution of sparsity inherent within neural networks. GA provides an elegant trade-off between sparsity and accuracy without the need for additional sparsity-inducing regularization. We integrate GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse.
arXiv Detail & Related papers (2023-04-14T06:19:07Z)
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size. Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies. In this work, we consider the dynamic sparse training as a sparse connectivity search problem. Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better [88.28293442298015]
Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices. We develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST) FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network.
arXiv Detail & Related papers (2021-12-18T02:26:38Z)
Structured Directional Pruning via Perturbation Orthogonal Projection [13.704348351073147]
A more reasonable approach is to find a sparse minimizer along the flat minimum valley found byNIST. We propose the structured directional pruning based on projecting the perturbations onto the flat minimum valley. Experiments show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining.
arXiv Detail & Related papers (2021-07-12T11:35:47Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [62.932299614630985]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients. FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning [17.272561332310303]
This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time. CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
arXiv Detail & Related papers (2020-07-07T05:52:37Z)
FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking. We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints. FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.