Related papers: Adaptive Scheduling for Edge-Assisted DNN Serving

Adaptive Scheduling for Edge-Assisted DNN Serving

URL: http://arxiv.org/abs/2304.09961v2
Date: Tue, 2 May 2023 19:05:35 GMT
Title: Adaptive Scheduling for Edge-Assisted DNN Serving
Authors: Jian He, Chenxi Yang, Zhaoyuan He, Ghufran Baig, Lili Qiu
Abstract summary: This paper examines how to speed up the edge server processing for multiple clients using deep neural networks. We first design a novel scheduling algorithm to exploit the benefits of all requests that run the same DNN. We then extend our algorithm to handle requests that use different DNNs with or without shared layers.
Score: 6.437829777289881
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).

Related papers

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN [47.205463459723056]
We present the Continuous Convolutional Neural Network (CCNN), a single CNN able to process data of arbitrary resolution, dimensionality and length without any structural changes. Its key component are its continuous convolutional kernels which model long-range dependencies at every layer. Our CCNN matches and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2023-01-25T12:12:47Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Improving the Performance of DNN-based Software Services using Automated Layer Caching [3.804240190982695]
Deep Neural Networks (DNNs) have become an essential component in many application domains including web-based services. The computational complexity in such large models can still be relatively significant, hindering low inference latency. In this paper, we propose an end-to-end automated solution to improve the performance of DNN-based services.
arXiv Detail & Related papers (2022-09-18T18:21:20Z)
Automated machine learning for borehole resistivity measurements [0.0]
Deep neural networks (DNNs) offer a real-time solution for the inversion of borehole resistivity measurements. It is possible to use extremely large DNNs to approximate the operators, but it demands a considerable training time. In this work, we propose a scoring function that accounts for the accuracy and size of the DNNs.
arXiv Detail & Related papers (2022-07-20T12:27:22Z)
Towards a General Purpose CNN for Long Range Dependencies in $\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes. We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$) Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z)
Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge [28.61344039233783]
We propose to facilitate the application of deep neural networks (DNNs) on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined em edge ensembles, is based on having diverse predictors at each device, which form an ensemble of models during inference. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network.
arXiv Detail & Related papers (2022-06-07T10:24:20Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
iRNN: Integer-only Recurrent Neural Network [0.8766022970635899]
We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN) Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by $2times$, and reduces the model size by $4times$.
arXiv Detail & Related papers (2021-09-20T20:17:40Z)
Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings. DNNs are often treated as black box systems, which complicates their evaluation and validation. One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations.
arXiv Detail & Related papers (2020-06-30T14:56:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.