DISCO: Distributed Inference with Sparse Communications
- URL: http://arxiv.org/abs/2302.11180v1
- Date: Wed, 22 Feb 2023 07:20:34 GMT
- Title: DISCO: Distributed Inference with Sparse Communications
- Authors: Minghai Qin, Chao Sun, Jaco Hofmann, Dejan Vucinic
- Abstract summary: Distributed computing is a common approach to reduce single-node memory consumption.
In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes.
We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution.
- Score: 4.463769269318892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) have great potential to solve many real-world
problems, but they usually require an extensive amount of computation and
memory. It is of great difficulty to deploy a large DNN model to a single
resource-limited device with small memory capacity. Distributed computing is a
common approach to reduce single-node memory consumption and to accelerate the
inference of DNN models. In this paper, we explore the "within-layer model
parallelism", which distributes the inference of each layer into multiple
nodes. In this way, the memory requirement can be distributed to many nodes,
making it possible to use several edge devices to infer a large DNN model. Due
to the dependency within each layer, data communications between nodes during
this parallel inference can be a bottleneck when the communication bandwidth is
limited. We propose a framework to train DNN models for Distributed Inference
with Sparse Communications (DISCO). We convert the problem of selecting which
subset of data to transmit between nodes into a model optimization problem, and
derive models with both computation and communication reduction when each layer
is inferred on multiple nodes. We show the benefit of the DISCO framework on a
variety of CV tasks such as image classification, object detection, semantic
segmentation, and image super resolution. The corresponding models include
important DNN building blocks such as convolutions and transformers. For
example, each layer of a ResNet-50 model can be distributively inferred across
two nodes with five times less data communications, almost half overall
computations and half memory requirement for a single node, and achieve
comparable accuracy to the original ResNet-50 model. This also results in 4.7
times overall inference speedup.
Related papers
- Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks.
We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z) - DEFER: Distributed Edge Inference for Deep Neural Networks [5.672898304129217]
We present DEFER, a framework for distributed edge inference.
It partitions deep neural networks into layers that can be spread across multiple compute nodes.
We find that for the ResNet50 model, the inference throughput of DEFER with 8 compute nodes is 53% higher and per node energy consumption is 63% lower than single device inference.
arXiv Detail & Related papers (2022-01-18T06:50:45Z) - JMSNAS: Joint Model Split and Neural Architecture Search for Learning
over Mobile Edge Networks [23.230079759174902]
Joint model split and neural architecture search (JMSNAS) framework is proposed to automatically generate and deploy a DNN model over a mobile edge network.
Considering both the computing and communication resource constraints, a computational graph search problem is formulated.
Experiment results confirm the superiority of the proposed framework over the state-of-the-art split machine learning design methods.
arXiv Detail & Related papers (2021-11-16T03:10:23Z) - VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using
Vector Quantization [70.8567058758375]
VQ-GNN is a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance.
Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix.
arXiv Detail & Related papers (2021-10-27T11:48:50Z) - Position-based Hash Embeddings For Scaling Graph Neural Networks [8.87527266373087]
Graph Neural Networks (GNNs) compute node representations by taking into account the topology of the node's ego-network and the features of the ego-network's nodes.
When the nodes do not have high-quality features, GNNs learn an embedding layer to compute node embeddings and use them as input features.
To reduce the memory associated with this embedding layer, hashing-based approaches, commonly used in applications like NLP and recommender systems, can potentially be used.
We present approaches that take advantage of the nodes' position in the graph to dramatically reduce the memory required.
arXiv Detail & Related papers (2021-08-31T22:42:25Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Towards Lossless Binary Convolutional Neural Networks Using Piecewise
Approximation [4.023728681102073]
CNNs can significantly reduce the number of arithmetic operations and the size of memory storage.
However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures.
We propose a Piecewise Approximation scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations.
arXiv Detail & Related papers (2020-08-08T13:32:33Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Towards Deeper Graph Neural Networks with Differentiable Group
Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors.
Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases.
We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.