Related papers: Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

URL: http://arxiv.org/abs/2206.03165v1
Date: Tue, 7 Jun 2022 10:24:20 GMT
Title: Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge
Authors: May Malka, Erez Farhan, Hai Morgenstern, and Nir Shlezinger
Abstract summary: We propose to facilitate the application of deep neural networks (DNNs) on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined em edge ensembles, is based on having diverse predictors at each device, which form an ensemble of models during inference. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network.
Score: 28.61344039233783
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The success of deep neural networks (DNNs) is heavily dependent on computational resources. While DNNs are often employed on cloud servers, there is a growing need to operate DNNs on edge devices. Edge devices are typically limited in their computational resources, yet, often multiple edge devices are deployed in the same environment and can reliably communicate with each other. In this work we propose to facilitate the application of DNNs on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined {\em edge ensembles}, is based on having diverse predictors at each device, which form an ensemble of models during inference. To mitigate the communication overhead, the users share quantized features, and we propose a method for aggregating multiple decisions into a single inference rule. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network. Our experiments demonstrate that collaborative inference via edge ensembles equipped with compact DNNs substantially improves the accuracy over having each user infer locally, and can outperform using a single centralized DNN larger than all the networks in the ensemble together.

Related papers

Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time [5.05866540830123]
We present ODiMO, a hardware-aware tool that efficiently explores fine-grain mapping of Deep Neural Networks (DNNs) among various on-chip CUs. We show that ODiMO reduces the latency of a DNN executed on the Darkside by up to 8x at iso-accuracy, compared to a manual mappings. When targeting energy, ODiMO produced up to 50.8x more efficient mappings, with minimal accuracy drop.
arXiv Detail & Related papers (2024-09-27T09:10:44Z)
MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via Automating Deep Neural Network Porting for Mobile Deployment [54.77943671991863]
MatchNAS is a novel scheme for porting Deep Neural Networks to mobile devices. We optimise a large network family using both labelled and unlabelled data. We then automatically search for tailored networks for different hardware platforms.
arXiv Detail & Related papers (2024-02-21T04:43:12Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
Enabling Deep Learning on Edge Devices [2.741266294612776]
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems
arXiv Detail & Related papers (2022-10-06T20:52:57Z)
Receptive Field-based Segmentation for Distributed CNN Inference Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network. We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks. In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework. In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z)
CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning over Heterogeneous Edge Devices [39.09319776243573]
CoEdge is a distributed Deep Neural Network (DNN) computing system that orchestrates cooperative inference over heterogeneous edge devices. CoEdge saves energy with close inference latency, achieving up to 25.5%66.9% energy reduction for four widely-adopted CNN models.
arXiv Detail & Related papers (2020-12-06T13:15:52Z)
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification [10.727102755903616]
We aim for efficient deep BNNs amenable to complex computer vision architectures. We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer. Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles.
arXiv Detail & Related papers (2020-12-04T19:50:09Z)
Joint Multi-User DNN Partitioning and Computational Resource Allocation for Collaborative Edge Intelligence [21.55340197267767]
Mobile Edge Computing (MEC) has emerged as a promising supporting architecture providing a variety of resources to the network edge. With the assistance of edge servers, user equipments (UEs) are able to run deep neural network (DNN) based AI applications. We propose an algorithm called Iterative Alternating Optimization (IAO) that can achieve the optimal solution in time.
arXiv Detail & Related papers (2020-07-15T09:40:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.