Related papers: $\mu$NAS: Constrained Neural Architecture Search for Microcontrollers

$\mu$NAS: Constrained Neural Architecture Search for Microcontrollers

URL: http://arxiv.org/abs/2010.14246v3
Date: Tue, 8 Dec 2020 17:02:50 GMT
Title: $\mu$NAS: Constrained Neural Architecture Search for Microcontrollers
Authors: Edgar Liberis, {\L}ukasz Dudziak, Nicholas D. Lane
Abstract summary: IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce. We build a neural architecture search (NAS) system, called $mu$NAS, to automate the design of such small-yet-powerful MCU-level networks. NAS is able to improve top-1 classification accuracy by up to 4.8%, or (b) reduce memory footprint by 4--13x, or (c) reduce the number of multiply-accumulate operations by at least 2x.
Score: 15.517404770022633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce: a typical MCU may have an underpowered processor and around 64 KB of memory and persistent storage, which is orders of magnitude fewer computational resources than is typically required for deep learning. Designing neural networks for such a platform requires an intricate balance between keeping high predictive performance (accuracy) while achieving low memory and storage usage and inference latency. This is extremely challenging to achieve manually, so in this work, we build a neural architecture search (NAS) system, called $\mu$NAS, to automate the design of such small-yet-powerful MCU-level networks. $\mu$NAS explicitly targets the three primary aspects of resource scarcity of MCUs: the size of RAM, persistent storage and processor speed. $\mu$NAS represents a significant advance in resource-efficient models, especially for "mid-tier" MCUs with memory requirements ranging from 0.5 KB to 64 KB. We show that on a variety of image classification datasets $\mu$NAS is able to (a) improve top-1 classification accuracy by up to 4.8%, or (b) reduce memory footprint by 4--13x, or (c) reduce the number of multiply-accumulate operations by at least 2x, compared to existing MCU specialist literature and resource-efficient models.

Related papers

DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms. Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z)
MicroNAS: Zero-Shot Neural Architecture Search for MCUs [5.813274149871141]
Neural Architecture Search (NAS) effectively discovers new Convolutional Neural Network (CNN) architectures. We propose MicroNAS, a hardware-aware zero-shot NAS framework for microcontroller units (MCUs) in edge computing. Compared to previous works, MicroNAS achieves up to 1104x improvement in search efficiency and discovers models with over 3.23x faster MCU inference.
arXiv Detail & Related papers (2024-01-17T06:17:42Z)
MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers [3.0723404270319685]
We adapt the concept of differentiable neural architecture search (DNAS) to solve the time-series classification problem on resource-constrained microcontrollers (MCUs) We introduce MicroNAS, a domain-specific HW-NAS system integration of DNAS, Lookup Tables, dynamic convolutions and a novel search space specifically designed for time-series classification on MCUs. Our studies on different MCUs and standard benchmark datasets demonstrate that MicroNAS finds MCU-tailored architectures that achieve performance (F1-score) near to state-of-the-art desktop models.
arXiv Detail & Related papers (2023-10-27T06:55:15Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers [48.74623838201632]
High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks. HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task.
arXiv Detail & Related papers (2021-06-11T18:11:36Z)
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers [18.662026553041937]
Machine learning on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of the Internet of Things (IoT) TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
arXiv Detail & Related papers (2020-10-21T19:39:39Z)
Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing. An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z)
Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors. Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z)
MCUNet: Tiny Deep Learning on IoT Devices [62.752899523628066]
We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine) TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
arXiv Detail & Related papers (2020-07-20T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.