MicroNets: Neural Network Architectures for Deploying TinyML
Applications on Commodity Microcontrollers
- URL: http://arxiv.org/abs/2010.11267v6
- Date: Mon, 12 Apr 2021 19:59:59 GMT
- Title: MicroNets: Neural Network Architectures for Deploying TinyML
Applications on Commodity Microcontrollers
- Authors: Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish
Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough
- Abstract summary: Machine learning on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of the Internet of Things (IoT)
TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget.
neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
- Score: 18.662026553041937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Executing machine learning workloads locally on resource constrained
microcontrollers (MCUs) promises to drastically expand the application space of
IoT. However, so-called TinyML presents severe technical challenges, as deep
neural network inference demands a large compute and memory budget. To address
this challenge, neural architecture search (NAS) promises to help design
accurate ML models that meet the tight MCU memory, latency and energy
constraints. A key component of NAS algorithms is their latency/energy model,
i.e., the mapping from a given neural network architecture to its inference
latency/energy on an MCU. In this paper, we observe an intriguing property of
NAS search spaces for MCU model design: on average, model latency varies
linearly with model operation (op) count under a uniform prior over models in
the search space. Exploiting this insight, we employ differentiable NAS (DNAS)
to search for models with low memory usage and low op count, where op count is
treated as a viable proxy to latency. Experimental results validate our
methodology, yielding our MicroNet models, which we deploy on MCUs using
Tensorflow Lite Micro, a standard open-source NN inference runtime widely used
in the TinyML community. MicroNets demonstrate state-of-the-art results for all
three TinyMLperf industry-standard benchmark tasks: visual wake words, audio
keyword spotting, and anomaly detection. Models and training scripts can be
found at github.com/ARM-software/ML-zoo.
Related papers
- DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via
Large Language Models [24.990028167518226]
This paper conducts further exploration in this direction by considering three important design metrics simultaneously.
We propose a novel LLM-based NAS framework, FL-NAS, in this paper.
We show experimentally that FL-NAS can indeed find high-performing DNNs, beating state-of-the-art DNN models by orders-of-magnitude across almost all design considerations.
arXiv Detail & Related papers (2024-02-09T00:49:03Z) - Efficient Neural Networks for Tiny Machine Learning: A Comprehensive
Review [1.049712834719005]
This review provides an in-depth analysis of the advancements in efficient neural networks and the deployment of deep learning models on ultra-low power microcontrollers.
The core of the review centres on efficient neural networks for TinyML.
It covers techniques such as model compression, quantization, and low-rank factorization, which optimize neural network architectures for minimal resource utilization.
The paper then delves into the deployment of deep learning models on ultra-low power MCUs, addressing challenges such as limited computational capabilities and memory resources.
arXiv Detail & Related papers (2023-11-20T16:20:13Z) - MicroNAS: Memory and Latency Constrained Hardware-Aware Neural
Architecture Search for Time Series Classification on Microcontrollers [3.0723404270319685]
We adapt the concept of differentiable neural architecture search (DNAS) to solve the time-series classification problem on resource-constrained microcontrollers (MCUs)
We introduce MicroNAS, a domain-specific HW-NAS system integration of DNAS, Lookup Tables, dynamic convolutions and a novel search space specifically designed for time-series classification on MCUs.
Our studies on different MCUs and standard benchmark datasets demonstrate that MicroNAS finds MCU-tailored architectures that achieve performance (F1-score) near to state-of-the-art desktop models.
arXiv Detail & Related papers (2023-10-27T06:55:15Z) - Enhancing Neural Architecture Search with Multiple Hardware Constraints
for Deep Learning Model Deployment on Tiny IoT Devices [17.919425885740793]
We propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods.
We show that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively.
arXiv Detail & Related papers (2023-10-11T06:09:14Z) - DeepPicarMicro: Applying TinyML to Autonomous Cyber Physical Systems [2.2667044691227636]
We present DeepPicarMicro, a small self-driving RC car testbed, which runs a convolutional neural network (CNN) on a Raspberry Pi Pico MCU.
We apply a state-of-the-art DNN optimization to successfully fit the well-known PilotNet CNN architecture.
We observe an interesting relationship between the accuracy, latency, and control performance of a system.
arXiv Detail & Related papers (2022-08-23T21:58:53Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - MCUNet: Tiny Deep Learning on IoT Devices [62.752899523628066]
We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine)
TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space.
TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
arXiv Detail & Related papers (2020-07-20T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.