Related papers: SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

URL: http://arxiv.org/abs/2008.06402v2
Date: Mon, 24 Aug 2020 10:24:41 GMT
Title: SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Authors: Stefanos Laskaridis, Stylianos I. Venieris, Mario Almeida, Ilias Leontiadis, Nicholas D. Lane
Abstract summary: A popular alternative comprises offloading CNN processing to powerful cloud-based servers. SPINN is a distributed inference system that employs synergistic device-cloud together with a progressive inference method. It provides robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.
Score: 13.315410752311768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2x in achieved throughput under varying network conditions, reduces the server cost by up to 6.8x and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.

Related papers

AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers. This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z)
Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks. embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption. split computing - where an SNN is partitioned across two devices - is a promising solution. This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings. We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z)
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems [12.427821850039448]
We propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. Our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices.
arXiv Detail & Related papers (2023-06-22T06:33:12Z)
Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network [20.05283214295881]
Spiking neural networks (SNNs) are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. We develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks. We harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference.
arXiv Detail & Related papers (2023-04-24T07:12:50Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device [17.43467167013752]
We present DynO, a distributed inference framework that combines the best of both worlds to address several challenges. We show that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution.
arXiv Detail & Related papers (2021-04-20T13:20:15Z)
CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning over Heterogeneous Edge Devices [39.09319776243573]
CoEdge is a distributed Deep Neural Network (DNN) computing system that orchestrates cooperative inference over heterogeneous edge devices. CoEdge saves energy with close inference latency, achieving up to 25.5%66.9% energy reduction for four widely-adopted CNN models.
arXiv Detail & Related papers (2020-12-06T13:15:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.