Multi-Component Optimization and Efficient Deployment of Neural-Networks
on Resource-Constrained IoT Hardware
- URL: http://arxiv.org/abs/2204.10183v1
- Date: Wed, 20 Apr 2022 13:30:04 GMT
- Title: Multi-Component Optimization and Efficient Deployment of Neural-Networks
on Resource-Constrained IoT Hardware
- Authors: Bharath Sudharsan, Dineshkumar Sundaram, Pankesh Patel, John G.
Breslin, Muhammad Intizar Ali, Schahram Dustdar, Albert Zomaya, Rajiv Ranjan
- Abstract summary: We present an end-to-end multi-component model optimization sequence and open-source its implementation.
Our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms.
- Score: 4.6095200019189475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The majority of IoT devices like smartwatches, smart plugs, HVAC controllers,
etc., are powered by hardware with a constrained specification (low memory,
clock speed and processor) which is insufficient to accommodate and execute
large, high-quality models. On such resource-constrained devices, manufacturers
still manage to provide attractive functionalities (to boost sales) by
following the traditional approach of programming IoT devices/products to
collect and transmit data (image, audio, sensor readings, etc.) to their
cloud-based ML analytics platforms. For decades, this online approach has been
facing issues such as compromised data streams, non-real-time analytics due to
latency, bandwidth constraints, costly subscriptions, recent privacy issues
raised by users and the GDPR guidelines, etc. In this paper, to enable
ultra-fast and accurate AI-based offline analytics on resource-constrained IoT
devices, we present an end-to-end multi-component model optimization sequence
and open-source its implementation. Researchers and developers can use our
optimization sequence to optimize high memory, computation demanding models in
multiple aspects in order to produce small size, low latency, low-power
consuming models that can comfortably fit and execute on resource-constrained
hardware. The experimental results show that our optimization components can
produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more
accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our
optimization sequence is generic and can be applied to any state-of-the-art
models trained for anomaly detection, predictive maintenance, robotics, voice
recognition, and machine vision.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation [2.3636539018632616]
This work empirically investigates the optimization of complex deep learning models to analyze their functionality on an embedded device.
It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection.
arXiv Detail & Related papers (2024-06-25T17:34:52Z) - Enhancing Neural Architecture Search with Multiple Hardware Constraints
for Deep Learning Model Deployment on Tiny IoT Devices [17.919425885740793]
We propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods.
We show that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively.
arXiv Detail & Related papers (2023-10-11T06:09:14Z) - Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy.
We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations.
We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers [10.420617367363047]
YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching.
YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
arXiv Detail & Related papers (2022-03-08T01:24:36Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Improving IoT Analytics through Selective Edge Execution [0.0]
We propose to improve the performance of analytics by leveraging edge infrastructure.
We devise an algorithm that enables the IoT devices to execute their routines locally.
We then outsource them to cloudlet servers, only if they predict they will gain a significant performance improvement.
arXiv Detail & Related papers (2020-03-07T15:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.