MCUNet: Tiny Deep Learning on IoT Devices
- URL: http://arxiv.org/abs/2007.10319v2
- Date: Thu, 19 Nov 2020 17:29:28 GMT
- Title: MCUNet: Tiny Deep Learning on IoT Devices
- Authors: Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han
- Abstract summary: We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine)
TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space.
TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
- Score: 62.752899523628066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning on tiny IoT devices based on microcontroller units (MCU) is
appealing but challenging: the memory of microcontrollers is 2-3 orders of
magnitude smaller even than mobile phones. We propose MCUNet, a framework that
jointly designs the efficient neural architecture (TinyNAS) and the lightweight
inference engine (TinyEngine), enabling ImageNet-scale inference on
microcontrollers. TinyNAS adopts a two-stage neural architecture search
approach that first optimizes the search space to fit the resource constraints,
then specializes the network architecture in the optimized search space.
TinyNAS can automatically handle diverse constraints (i.e.device, latency,
energy, memory) under low search costs.TinyNAS is co-designed with TinyEngine,
a memory-efficient inference library to expand the search space and fit a
larger model. TinyEngine adapts the memory scheduling according to the overall
network topology rather than layer-wise optimization, reducing the memory usage
by 4.8x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Micro
and CMSIS-NN. MCUNet is the first to achieves >70% ImageNet top1 accuracy on an
off-the-shelf commercial microcontroller, using 3.5x less SRAM and 5.7x less
Flash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wake
words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x faster
than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1x smaller peak
SRAM. Our study suggests that the era of always-on tiny machine learning on IoT
devices has arrived. Code and models can be found here: https://tinyml.mit.edu.
Related papers
- TinyTNAS: GPU-Free, Time-Bound, Hardware-Aware Neural Architecture Search for TinyML Time Series Classification [6.9604565273682955]
We present TinyTNAS, a novel hardware-aware multi-objective Neural Architecture Search (NAS) tool specifically designed for TinyML time series classification.
Unlike traditional NAS methods that rely on GPU capabilities, TinyTNAS operates efficiently on CPUs, making it accessible for a broader range of applications.
TinyTNAS demonstrates state-of-the-art accuracy with significant reductions in RAM, FLASH, MAC usage, and latency.
arXiv Detail & Related papers (2024-08-29T13:50:08Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - OPANAS: One-Shot Path Aggregation Network Architecture Search for Object
Detection [82.04372532783931]
Recently, neural architecture search (NAS) has been exploited to design feature pyramid networks (FPNs)
We propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy.
arXiv Detail & Related papers (2021-03-08T01:48:53Z) - FP-NAS: Fast Probabilistic Neural Architecture Search [49.21560787752714]
Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as much memory as needed to train a single model.
We propose a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds.
We show that Fast Probabilistic NAS (FP-NAS) can sample 64% fewer architectures and search 2.1x faster than PARSEC.
arXiv Detail & Related papers (2020-11-22T06:10:05Z) - Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks.
This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z) - $\mu$NAS: Constrained Neural Architecture Search for Microcontrollers [15.517404770022633]
IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce.
We build a neural architecture search (NAS) system, called $mu$NAS, to automate the design of such small-yet-powerful MCU-level networks.
NAS is able to improve top-1 classification accuracy by up to 4.8%, or (b) reduce memory footprint by 4--13x, or (c) reduce the number of multiply-accumulate operations by at least 2x.
arXiv Detail & Related papers (2020-10-27T12:42:53Z) - MicroNets: Neural Network Architectures for Deploying TinyML
Applications on Commodity Microcontrollers [18.662026553041937]
Machine learning on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of the Internet of Things (IoT)
TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget.
neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
arXiv Detail & Related papers (2020-10-21T19:39:39Z) - Efficient Neural Network Deployment for Microcontroller [0.0]
This paper is going to explore and generalize convolution neural network deployment for microcontrollers.
The memory savings and performance will be compared with CMSIS-NN framework developed for ARM Cortex-M CPUs.
The final purpose is to develop a tool consuming PyTorch model with trained network weights, and it turns into an optimized inference engine in C/C++ for low memory(kilobyte level) and limited computing capable microcontrollers.
arXiv Detail & Related papers (2020-07-02T19:21:05Z) - FBNetV2: Differentiable Neural Architecture Search for Spatial and
Channel Dimensions [70.59851564292828]
Differentiable Neural Architecture Search (DNAS) has demonstrated great success in designing state-of-the-art, efficient neural networks.
We propose a memory and computationally efficient DNAS variant: DMaskingNAS.
This algorithm expands the search space by up to $1014times$ over conventional DNAS.
arXiv Detail & Related papers (2020-04-12T08:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.