Communication-Efficient Separable Neural Network for Distributed
Inference on Edge Devices
- URL: http://arxiv.org/abs/2111.02489v1
- Date: Wed, 3 Nov 2021 19:30:28 GMT
- Title: Communication-Efficient Separable Neural Network for Distributed
Inference on Edge Devices
- Authors: Jun-Liang Lin and Sheng-De Wang
- Abstract summary: We propose a novel method of exploiting model parallelism to separate a neural network for distributed inferences.
Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inference of Neural Networks is usually restricted by the resources
(e.g., computing power, memory, bandwidth) on edge devices. In addition to
improving the hardware design and deploying efficient models, it is possible to
aggregate the computing power of many devices to enable the machine learning
models. In this paper, we proposed a novel method of exploiting model
parallelism to separate a neural network for distributed inferences. To achieve
a better balance between communication latency, computation latency, and
performance, we adopt neural architecture search (NAS) to search for the best
transmission policy and reduce the amount of communication. The best model we
found decreases by 86.6% of the amount of data transmission compared to the
baseline and does not impact performance much. Under proper specifications of
devices and configurations of models, our experiments show that the inference
of large neural networks on edge clusters can be distributed and accelerated,
which provides a new solution for the deployment of intelligent applications in
the internet of things (IoT).
Related papers
- Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference.
Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms.
We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.
embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.
split computing - where an SNN is partitioned across two devices - is a promising solution.
This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - The Robustness of Spiking Neural Networks in Communication and its Application towards Network Efficiency in Federated Learning [6.9569682335746235]
Spiking Neural Networks (SNNs) have recently gained significant interest in on-chip learning in embedded devices.
In this paper, we explore the inherent robustness of SNNs under noisy communication in Federated Learning.
We propose a novel Federated Learning with TopK Sparsification algorithm to reduce the bandwidth usage for FL training.
arXiv Detail & Related papers (2024-09-19T13:37:18Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Neural Architecture Search for Improving Latency-Accuracy Trade-off in
Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems.
In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks.
This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Efficient Low-Latency Dynamic Licensing for Deep Neural Network
Deployment on Edge Devices [0.0]
We propose an architecture to solve deploying and processing deep neural networks on edge-devices.
Adopting this architecture allows low-latency model updates on devices.
arXiv Detail & Related papers (2021-02-24T09:36:39Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.