Measuring what Really Matters: Optimizing Neural Networks for TinyML
- URL: http://arxiv.org/abs/2104.10645v1
- Date: Wed, 21 Apr 2021 17:14:06 GMT
- Title: Measuring what Really Matters: Optimizing Neural Networks for TinyML
- Authors: Lennart Heim, Andreas Biri, Zhongnan Qu, Lothar Thiele
- Abstract summary: neural networks (NNs) have experienced an unprecedented growth in architectural and computational complexity. Introducing NNs to resource-constrained devices enables cost-efficient deployments, widespread availability, and the preservation of sensitive data.
This work addresses the challenges of bringing Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M architecture.
- Score: 7.455546102930911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the surge of inexpensive computational and memory resources, neural
networks (NNs) have experienced an unprecedented growth in architectural and
computational complexity. Introducing NNs to resource-constrained devices
enables cost-efficient deployments, widespread availability, and the
preservation of sensitive data. This work addresses the challenges of bringing
Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M
architecture. The detailed effects and trade-offs that optimization methods,
software frameworks, and MCU hardware architecture have on key performance
metrics such as inference latency and energy consumption have not been
previously studied in depth for state-of-the-art frameworks such as TensorFlow
Lite Micro. We find that empirical investigations which measure the perceptible
metrics - performance as experienced by the user - are indispensable, as the
impact of specialized instructions and layer types can be subtle. To this end,
we propose an implementation-aware design as a cost-effective method for
verification and benchmarking. Employing our developed toolchain, we
demonstrate how existing NN deployments on resource-constrained devices can be
improved by systematically optimizing NNs to their targeted application
scenario.
Related papers
- Empowering Malware Detection Efficiency within Processing-in-Memory Architecture [0.7910057416898179]
Malware detection techniques leveraging Machine Learning have gained popularity.
One major drawback of neural network architectures is their substantial computational resource requirements.
We propose a Processing-in-Memory (PIM)-based architecture to mitigate memory access latency.
arXiv Detail & Related papers (2024-04-12T21:28:43Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying Deep Neural Networks on microcontrollers (TinyML) based on Multi-Objective Bayesian optimization (MOBOpt)
Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory consumption on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - Energy-efficient Deployment of Deep Learning Applications on Cortex-M
based Microcontrollers using Deep Compression [1.4050836886292872]
This paper investigates the efficient deployment of deep learning models on resource-constrained microcontrollers.
We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies.
We show that we can compress them to below 10% of their original parameter count before their predictive quality decreases.
arXiv Detail & Related papers (2022-05-20T10:55:42Z) - Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z) - HCM: Hardware-Aware Complexity Metric for Neural Network Architectures [6.556553154231475]
This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures.
We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices.
arXiv Detail & Related papers (2020-04-19T16:42:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.