Enhancing Neural Architecture Search with Multiple Hardware Constraints
for Deep Learning Model Deployment on Tiny IoT Devices
- URL: http://arxiv.org/abs/2310.07217v1
- Date: Wed, 11 Oct 2023 06:09:14 GMT
- Title: Enhancing Neural Architecture Search with Multiple Hardware Constraints
for Deep Learning Model Deployment on Tiny IoT Devices
- Authors: Alessio Burrello, Matteo Risso, Beatrice Alessandra Motetti, Enrico
Macii, Luca Benini, Daniele Jahier Pagliari
- Abstract summary: We propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods.
We show that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively.
- Score: 17.919425885740793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid proliferation of computing domains relying on Internet of Things
(IoT) devices has created a pressing need for efficient and accurate
deep-learning (DL) models that can run on low-power devices. However,
traditional DL models tend to be too complex and computationally intensive for
typical IoT end-nodes. To address this challenge, Neural Architecture Search
(NAS) has emerged as a popular design automation technique for co-optimizing
the accuracy and complexity of deep neural networks. Nevertheless, existing NAS
techniques require many iterations to produce a network that adheres to
specific hardware constraints, such as the maximum memory available on the
hardware or the maximum latency allowed by the target application. In this
work, we propose a novel approach to incorporate multiple constraints into
so-called Differentiable NAS optimization methods, which allows the generation,
in a single shot, of a model that respects user-defined constraints on both
memory and latency in a time comparable to a single standard training. The
proposed approach is evaluated on five IoT-relevant benchmarks, including the
MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search,
it is possible to reduce memory and latency by 87.4% and 54.2%, respectively
(as defined by our targets), while ensuring non-inferior accuracy on
state-of-the-art hand-tuned deep neural networks for TinyML.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Quantization-aware Neural Architectural Search for Intrusion Detection [5.010685611319813]
We present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs.
The number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
arXiv Detail & Related papers (2023-11-07T18:35:29Z) - Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying Deep Neural Networks on microcontrollers (TinyML) based on Multi-Objective Bayesian optimization (MOBOpt)
Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory consumption on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z) - Lightweight Neural Architecture Search for Temporal Convolutional
Networks at the Edge [21.72253397805102]
This work focuses in particular on Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing.
We propose the first NAS tool that explicitly targets the optimization of the most peculiar architectural parameters of TCNs.
We test the proposed NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
arXiv Detail & Related papers (2023-01-24T19:47:40Z) - Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained
Deep Neural Networks [22.40937602825472]
Energy and memory are rarely considered simultaneously, in particular by low-search-cost Differentiable (DNAS) solutions.
We propose the first DNAS that directly addresses the most realistic scenario from a designer's perspective.
Our networks span a range of 2.18x in energy consumption and 4.04% in accuracy for the same memory constraint, and reduce energy by up to 2.2x with negligible accuracy drop with respect to the baseline.
arXiv Detail & Related papers (2022-06-01T08:04:50Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Lightweight Residual Densely Connected Convolutional Neural Network [18.310331378001397]
The lightweight residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network.
The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment.
arXiv Detail & Related papers (2020-01-02T17:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.