Diagonal Memory Optimisation for Machine Learning on Micro-controllers
- URL: http://arxiv.org/abs/2010.01668v2
- Date: Mon, 16 Nov 2020 21:40:43 GMT
- Title: Diagonal Memory Optimisation for Machine Learning on Micro-controllers
- Authors: Peter Blacker, Christopher Paul Bridges, Simon Hadfield
- Abstract summary: Micro controllers and low power CPUs are increasingly being used to perform inference with machine learning models.
Small amounts of RAM available on these targets sets limits on size of models which can be executed.
diagonal memory optimisation technique is described and shown to achieve memory savings of up to 34.5% when applied to eleven common models.
- Score: 21.222568055417717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine learning spreads into more and more application areas, micro
controllers and low power CPUs are increasingly being used to perform inference
with machine learning models. The capability to deploy onto these limited
hardware targets is enabling machine learning models to be used across a
diverse range of new domains. Optimising the inference process on these targets
poses different challenges from either desktop CPU or GPU implementations,
where the small amounts of RAM available on these targets sets limits on size
of models which can be executed. Analysis of the memory use patterns of eleven
machine learning models was performed. Memory load and store patterns were
observed using a modified version of the Valgrind debugging tool, identifying
memory areas holding values necessary for the calculation as inference
progressed. These analyses identified opportunities optimise the memory use of
these models by overlapping the input and output buffers of individual tensor
operations. Three methods are presented which can calculate the safe overlap of
input and output buffers for tensor operations. Ranging from a computationally
expensive approach with the ability to operate on compiled layer operations, to
a versatile analytical solution which requires access to the original source
code of the layer. The diagonal memory optimisation technique is described and
shown to achieve memory savings of up to 34.5% when applied to eleven common
models. Micro-controller targets are identified where it is only possible to
deploy some models if diagonal memory optimisation is used.
Related papers
- AI and Memory Wall [81.06494558184049]
We show how memory bandwidth can become the dominant bottleneck for decoder models.
We argue for a redesign in model architecture, training, and deployment strategies to overcome this memory limitation.
arXiv Detail & Related papers (2024-03-21T04:31:59Z) - Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling.
This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z) - SqueezeLLM: Dense-and-Sparse Quantization [80.32162537942138]
Main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, for single batch inference.
We introduce SqueezeLLM, a post-training quantization framework that enables lossless compression to ultra-low precisions of up to 3-bit.
Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format.
arXiv Detail & Related papers (2023-06-13T08:57:54Z) - Optimizing Memory Mapping Using Deep Reinforcement Learning [29.48627805378257]
This paper focuses on the memory mapping problem that occurs during compilation of machine learning programs.
We introduce an approach for solving the memory mapping problem using Reinforcement Learning.
We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions.
arXiv Detail & Related papers (2023-05-11T11:55:16Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Pex: Memory-efficient Microcontroller Deep Learning through Partial
Execution [11.336229510791481]
We discuss a novel execution paradigm for microcontroller deep learning.
It modifies the execution of neural networks to avoid materialising full buffers in memory.
This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
arXiv Detail & Related papers (2022-11-30T18:47:30Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Experimentally realized memristive memory augmented neural network [0.0]
Lifelong on-device learning is a key challenge for machine intelligence.
Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory.
We implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform.
arXiv Detail & Related papers (2022-04-15T11:52:30Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy.
At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy.
This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.