Efficient On-device Training via Gradient Filtering
- URL: http://arxiv.org/abs/2301.00330v2
- Date: Sat, 25 Mar 2023 02:12:09 GMT
- Title: Efficient On-device Training via Gradient Filtering
- Authors: Yuedong Yang, Guihong Li, Radu Marculescu
- Abstract summary: We propose a new gradient filtering approach which enables on-device CNN model training.
Our approach creates a special structure with fewer unique elements in the gradient map.
Our approach opens up a new direction of research with a huge potential for on-device training.
- Score: 14.484604762427717
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite its importance for federated learning, continuous learning and many
other applications, on-device training remains an open problem for EdgeAI. The
problem stems from the large number of operations (e.g., floating point
multiplications and additions) and memory consumption required during training
by the back-propagation algorithm. Consequently, in this paper, we propose a
new gradient filtering approach which enables on-device CNN model training.
More precisely, our approach creates a special structure with fewer unique
elements in the gradient map, thus significantly reducing the computational
complexity and memory consumption of back propagation during training.
Extensive experiments on image classification and semantic segmentation with
multiple CNN models (e.g., MobileNet, DeepLabV3, UPerNet) and devices (e.g.,
Raspberry Pi and Jetson Nano) demonstrate the effectiveness and wide
applicability of our approach. For example, compared to SOTA, we achieve up to
19$\times$ speedup and 77.1% memory savings on ImageNet classification with
only 0.1% accuracy loss. Finally, our method is easy to implement and deploy;
over 20$\times$ speedup and 90% energy savings have been observed compared to
highly optimized baselines in MKLDNN and CUDNN on NVIDIA Jetson Nano.
Consequently, our approach opens up a new direction of research with a huge
potential for on-device training.
Related papers
- Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting.
This paper proposes a simple but effective edge-friendly incremental learning framework.
Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z) - Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models [17.34908967455907]
machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch.
Fast-NTK is a novel NTK-based unlearning algorithm that significantly reduces the computational complexity.
arXiv Detail & Related papers (2023-12-22T18:55:45Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet
Level [7.410940271545853]
We present FastHebb, an efficient and scalable solution for Hebbian learning.
FastHebb outperforms previous solutions by up to 50 times in terms of training speed.
For the first time, we are able to bring Hebbian algorithms to ImageNet scale.
arXiv Detail & Related papers (2022-07-07T09:04:55Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - perf4sight: A toolflow to model CNN training performance on Edge GPUs [16.61258138725983]
This work proposes perf4sight, an automated methodology for developing accurate models that predict CNN training memory footprint and latency.
With PyTorch as the framework and NVIDIA Jetson TX2 as the target device, the developed models predict training memory footprint and latency with 95% and 91% accuracy respectively.
arXiv Detail & Related papers (2021-08-12T07:55:37Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Enabling On-Device CNN Training by Self-Supervised Instance Filtering
and Error Map Pruning [17.272561332310303]
This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time.
CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
arXiv Detail & Related papers (2020-07-07T05:52:37Z) - Multi-Precision Policy Enforced Training (MuPPET): A precision-switching
strategy for quantised fixed-point training of CNNs [13.83645579871775]
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks.
This work pushes the boundary of quantised training by employing a multilevel approach that utilises multiple precisions.
MuPPET achieves the same accuracy as standard full-precision training with training-time speedup of up to 1.84$times$ and an average speedup of 1.58$times$ across the networks.
arXiv Detail & Related papers (2020-06-16T10:14:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.