Pex: Memory-efficient Microcontroller Deep Learning through Partial
Execution
- URL: http://arxiv.org/abs/2211.17246v1
- Date: Wed, 30 Nov 2022 18:47:30 GMT
- Title: Pex: Memory-efficient Microcontroller Deep Learning through Partial
Execution
- Authors: Edgar Liberis, Nicholas D. Lane
- Abstract summary: We discuss a novel execution paradigm for microcontroller deep learning.
It modifies the execution of neural networks to avoid materialising full buffers in memory.
This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
- Score: 11.336229510791481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedded and IoT devices, largely powered by microcontroller units (MCUs),
could be made more intelligent by leveraging on-device deep learning. One of
the main challenges of neural network inference on an MCU is the extremely
limited amount of read-write on-chip memory (SRAM, < 512 kB). SRAM is consumed
by the neural network layer (operator) input and output buffers, which,
traditionally, must be in memory (materialised) for an operator to execute. We
discuss a novel execution paradigm for microcontroller deep learning, which
modifies the execution of neural networks to avoid materialising full buffers
in memory, drastically reducing SRAM usage with no computation overhead. This
is achieved by exploiting the properties of operators, which can
consume/produce a fraction of their input/output at a time. We describe a
partial execution compiler, Pex, which produces memory-efficient execution
schedules automatically by identifying subgraphs of operators whose execution
can be split along the feature ("channel") dimension. Memory usage is reduced
further by targeting memory bottlenecks with structured pruning, leading to the
co-design of the network architecture and its execution schedule. Our
evaluation of image and audio classification models: (a) establishes
state-of-the-art performance in low SRAM usage regimes for considered tasks
with up to +2.9% accuracy increase; (b) finds that a 4x memory reduction is
possible by applying partial execution alone, or up to 10.5x when using the
compiler-pruning co-design, while maintaining the classification accuracy
compared to prior work; (c) uses the recovered SRAM to process higher
resolution inputs instead, increasing accuracy by up to +3.9% on Visual Wake
Words.
Related papers
- Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory [66.88278207591294]
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data.
PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
arXiv Detail & Related papers (2024-04-18T03:03:46Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - Keyword Spotting System and Evaluation of Pruning and Quantization
Methods on Low-power Edge Microcontrollers [7.570300579676175]
Keywords spotting (KWS) is beneficial for voice-based user interactions with low-power devices at the edge.
This paper shows our small-footprint KWS system running on STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM.
arXiv Detail & Related papers (2022-08-04T16:49:45Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated
Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping.
We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations.
Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z) - Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues.
Access to this explicit memory occurs via soft read and write operations involving every individual memory entry.
We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z) - Efficient Neural Network Deployment for Microcontroller [0.0]
This paper is going to explore and generalize convolution neural network deployment for microcontrollers.
The memory savings and performance will be compared with CMSIS-NN framework developed for ARM Cortex-M CPUs.
The final purpose is to develop a tool consuming PyTorch model with trained network weights, and it turns into an optimized inference engine in C/C++ for low memory(kilobyte level) and limited computing capable microcontrollers.
arXiv Detail & Related papers (2020-07-02T19:21:05Z) - In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML
Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications.
Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle.
The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.