Related papers: On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers

On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers

URL: http://arxiv.org/abs/2407.10734v2
Date: Wed, 28 Aug 2024 15:36:08 GMT
Title: On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers
Authors: Mark Deutel, Frank Hannig, Christopher Mutschler, Jürgen Teich,
Abstract summary: We present a method that enables efficient training of DNNs completely in place on the MCU using fully quantized training (FQT) and dynamic partial gradient updates. We demonstrate the feasibility of our approach on multiple vision and time-series datasets and provide insights into the tradeoff between training accuracy, memory overhead, energy, and latency on real hardware.
Score: 4.370731001036268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: On-device training of DNNs allows models to adapt and fine-tune to newly collected data or changing domains while deployed on microcontroller units (MCUs). However, DNN training is a resource-intensive task, making the implementation and execution of DNN training algorithms on MCUs challenging due to low processor speeds, constrained throughput, limited floating-point support, and memory constraints. In this work, we explore on-device training of DNNs for Cortex-M MCUs. We present a method that enables efficient training of DNNs completely in place on the MCU using fully quantized training (FQT) and dynamic partial gradient updates. We demonstrate the feasibility of our approach on multiple vision and time-series datasets and provide insights into the tradeoff between training accuracy, memory overhead, energy, and latency on real hardware.

Related papers

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Accelerator-Aware Training for Transducer-Based Speech Recognition [16.959329474794092]
In this work, we replicate the NNA operators during the training phase, accounting for the degradation due to low-precision inference on the NNA in back-propagation. Our proposed method efficiently emulates NNA operations, thus foregoing the need to transfer quantization error-prone data to the CPU. We train and evaluate models on 270K hours of English data and show a 5-7% improvement in engine latency while saving up to 10% relative degradation in WER.
arXiv Detail & Related papers (2023-05-12T21:49:51Z)
SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware. We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors. In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL) We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z)
Designing and Training of Lightweight Neural Networks on Edge Devices using Early Halting in Knowledge Distillation [16.74710649245842]
This paper presents a novel approach for designing and training lightweight Deep Neural Networks (DNN) on edge devices. The approach considers the available storage, processing speed, and allowable maximum processing time. We introduce a novel early halting technique, which preserves network resources.
arXiv Detail & Related papers (2022-09-30T16:18:24Z)
BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes [0.8201100713224002]
FCFS-based scheduling policies result in many transient idle nodes. We show how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training.
arXiv Detail & Related papers (2021-06-22T22:53:19Z)
Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred. This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.