Communication-Efficient Separable Neural Network for Distributed
Inference on Edge Devices
- URL: http://arxiv.org/abs/2111.02489v1
- Date: Wed, 3 Nov 2021 19:30:28 GMT
- Title: Communication-Efficient Separable Neural Network for Distributed
Inference on Edge Devices
- Authors: Jun-Liang Lin and Sheng-De Wang
- Abstract summary: We propose a novel method of exploiting model parallelism to separate a neural network for distributed inferences.
Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inference of Neural Networks is usually restricted by the resources
(e.g., computing power, memory, bandwidth) on edge devices. In addition to
improving the hardware design and deploying efficient models, it is possible to
aggregate the computing power of many devices to enable the machine learning
models. In this paper, we proposed a novel method of exploiting model
parallelism to separate a neural network for distributed inferences. To achieve
a better balance between communication latency, computation latency, and
performance, we adopt neural architecture search (NAS) to search for the best
transmission policy and reduce the amount of communication. The best model we
found decreases by 86.6% of the amount of data transmission compared to the
baseline and does not impact performance much. Under proper specifications of
devices and configurations of models, our experiments show that the inference
of large neural networks on edge clusters can be distributed and accelerated,
which provides a new solution for the deployment of intelligent applications in
the internet of things (IoT).
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - The Robustness of Spiking Neural Networks in Communication and its Application towards Network Efficiency in Federated Learning [6.9569682335746235]
Spiking Neural Networks (SNNs) have recently gained significant interest in on-chip learning in embedded devices.
In this paper, we explore the inherent robustness of SNNs under noisy communication in Federated Learning.
We propose a novel Federated Learning with TopK Sparsification algorithm to reduce the bandwidth usage for FL training.
arXiv Detail & Related papers (2024-09-19T13:37:18Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Neural Architecture Search for Improving Latency-Accuracy Trade-off in
Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems.
In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks.
This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - Efficient Low-Latency Dynamic Licensing for Deep Neural Network
Deployment on Edge Devices [0.0]
We propose an architecture to solve deploying and processing deep neural networks on edge-devices.
Adopting this architecture allows low-latency model updates on devices.
arXiv Detail & Related papers (2021-02-24T09:36:39Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.