DEFER: Distributed Edge Inference for Deep Neural Networks
- URL: http://arxiv.org/abs/2201.06769v1
- Date: Tue, 18 Jan 2022 06:50:45 GMT
- Title: DEFER: Distributed Edge Inference for Deep Neural Networks
- Authors: Arjun Parthasarathy and Bhaskar Krishnamachari
- Abstract summary: We present DEFER, a framework for distributed edge inference.
It partitions deep neural networks into layers that can be spread across multiple compute nodes.
We find that for the ResNet50 model, the inference throughput of DEFER with 8 compute nodes is 53% higher and per node energy consumption is 63% lower than single device inference.
- Score: 5.672898304129217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern machine learning tools such as deep neural networks (DNNs) are playing
a revolutionary role in many fields such as natural language processing,
computer vision, and the internet of things. Once they are trained, deep
learning models can be deployed on edge computers to perform classification and
prediction on real-time data for these applications. Particularly for large
models, the limited computational and memory resources on a single edge device
can become the throughput bottleneck for an inference pipeline. To increase
throughput and decrease per-device compute load, we present DEFER (Distributed
Edge inFERence), a framework for distributed edge inference, which partitions
deep neural networks into layers that can be spread across multiple compute
nodes. The architecture consists of a single "dispatcher" node to distribute
DNN partitions and inference data to respective compute nodes. The compute
nodes are connected in a series pattern where each node's computed result is
relayed to the subsequent node. The result is then returned to the Dispatcher.
We quantify the throughput, energy consumption, network payload, and overhead
for our framework under realistic network conditions using the CORE network
emulator. We find that for the ResNet50 model, the inference throughput of
DEFER with 8 compute nodes is 53% higher and per node energy consumption is 63%
lower than single device inference. We further reduce network communication
demands and energy consumption using the ZFP serialization and LZ4 compression
algorithms. We have implemented DEFER in Python using the TensorFlow and Keras
ML libraries, and have released DEFER as an open-source framework to benefit
the research community.
Related papers
- DISCO: Distributed Inference with Sparse Communications [4.463769269318892]
Distributed computing is a common approach to reduce single-node memory consumption.
In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes.
We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution.
arXiv Detail & Related papers (2023-02-22T07:20:34Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Edge-Detect: Edge-centric Network Intrusion Detection using Deep Neural
Network [0.0]
Edge nodes are crucial for detection against multitudes of cyber attacks on Internet-of-Things endpoints.
We develop a novel light, fast and accurate 'Edge-Detect' model, which detects Denial of Service attack on edge nodes using DLM techniques.
arXiv Detail & Related papers (2021-02-03T04:24:34Z) - ItNet: iterative neural networks with small graphs for accurate and
efficient anytime prediction [1.52292571922932]
In this study, we introduce a class of network models that have a small memory footprint in terms of their computational graphs.
We show state-of-the-art results for semantic segmentation on the CamVid and Cityscapes datasets.
arXiv Detail & Related papers (2021-01-21T15:56:29Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - GPU Acceleration of Sparse Neural Networks [0.0]
We show that we can gain significant speedup for full activation of sparse neural networks using graphical processing units.
Our results show that the activation of sparse neural networks lends very well to GPU acceleration and can help speed up machine learning strategies.
arXiv Detail & Related papers (2020-05-09T02:18:31Z) - Learning Sparse & Ternary Neural Networks with Entropy-Constrained
Trained Ternarization (EC2T) [17.13246260883765]
Deep neural networks (DNNs) have shown remarkable success in a variety of machine learning applications.
In recent years, there is an increasing interest in deploying DNNs to resource-constrained devices with limited energy, memory, and computational budget.
We propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks.
arXiv Detail & Related papers (2020-04-02T15:38:00Z) - EdgeNets:Edge Varying Graph Neural Networks [179.99395949679547]
This paper puts forth a general framework that unifies state-of-the-art graph neural networks (GNNs) through the concept of EdgeNet.
An EdgeNet is a GNN architecture that allows different nodes to use different parameters to weigh the information of different neighbors.
This is a general linear and local operation that a node can perform and encompasses under one formulation all existing graph convolutional neural networks (GCNNs) as well as graph attention networks (GATs)
arXiv Detail & Related papers (2020-01-21T15:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.