Measuring what Really Matters: Optimizing Neural Networks for TinyML
- URL: http://arxiv.org/abs/2104.10645v1
- Date: Wed, 21 Apr 2021 17:14:06 GMT
- Title: Measuring what Really Matters: Optimizing Neural Networks for TinyML
- Authors: Lennart Heim, Andreas Biri, Zhongnan Qu, Lothar Thiele
- Abstract summary: neural networks (NNs) have experienced an unprecedented growth in architectural and computational complexity. Introducing NNs to resource-constrained devices enables cost-efficient deployments, widespread availability, and the preservation of sensitive data.
This work addresses the challenges of bringing Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M architecture.
- Score: 7.455546102930911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the surge of inexpensive computational and memory resources, neural
networks (NNs) have experienced an unprecedented growth in architectural and
computational complexity. Introducing NNs to resource-constrained devices
enables cost-efficient deployments, widespread availability, and the
preservation of sensitive data. This work addresses the challenges of bringing
Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M
architecture. The detailed effects and trade-offs that optimization methods,
software frameworks, and MCU hardware architecture have on key performance
metrics such as inference latency and energy consumption have not been
previously studied in depth for state-of-the-art frameworks such as TensorFlow
Lite Micro. We find that empirical investigations which measure the perceptible
metrics - performance as experienced by the user - are indispensable, as the
impact of specialized instructions and layer types can be subtle. To this end,
we propose an implementation-aware design as a cost-effective method for
verification and benchmarking. Employing our developed toolchain, we
demonstrate how existing NN deployments on resource-constrained devices can be
improved by systematically optimizing NNs to their targeted application
scenario.
Related papers
- Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML.
This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model.
A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - Empowering Malware Detection Efficiency within Processing-in-Memory Architecture [0.7910057416898179]
Malware detection techniques leveraging Machine Learning have gained popularity.
One major drawback of neural network architectures is their substantial computational resource requirements.
We propose a Processing-in-Memory (PIM)-based architecture to mitigate memory access latency.
arXiv Detail & Related papers (2024-04-12T21:28:43Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying Deep Neural Networks on microcontrollers (TinyML) based on Multi-Objective Bayesian optimization (MOBOpt)
Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory consumption on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - Energy-efficient Deployment of Deep Learning Applications on Cortex-M
based Microcontrollers using Deep Compression [1.4050836886292872]
This paper investigates the efficient deployment of deep learning models on resource-constrained microcontrollers.
We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies.
We show that we can compress them to below 10% of their original parameter count before their predictive quality decreases.
arXiv Detail & Related papers (2022-05-20T10:55:42Z) - Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.