Cooperative Minibatching in Graph Neural Networks
- URL: http://arxiv.org/abs/2310.12403v2
- Date: Sun, 22 Oct 2023 02:01:01 GMT
- Title: Cooperative Minibatching in Graph Neural Networks
- Authors: Muhammed Fatih Balin, Dominique LaSalle, \"Umit V. \c{C}ataly\"urek
- Abstract summary: We propose a new approach called Cooperative Minibatching to reduce the effects of Neighborhood Explosion Phenomenon (NEP)
We show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches.
We achieve up to 64% speedup over Independent Minibatching on single-node multi- GPU systems.
- Score: 1.534667887016089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Significant computational resources are required to train Graph Neural
Networks (GNNs) at a large scale, and the process is highly data-intensive. One
of the most effective ways to reduce resource requirements is minibatch
training coupled with graph sampling. GNNs have the unique property that items
in a minibatch have overlapping data. However, the commonly implemented
Independent Minibatching approach assigns each Processing Element (PE) its own
minibatch to process, leading to duplicated computations and input data access
across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which
is the main bottleneck limiting scaling. To reduce the effects of NEP in the
multi-PE setting, we propose a new approach called Cooperative Minibatching.
Our approach capitalizes on the fact that the size of the sampled subgraph is a
concave function of the batch size, leading to significant reductions in the
amount of work per seed vertex as batch sizes increase. Hence, it is favorable
for processors equipped with a fast interconnect to work on a large minibatch
together as a single larger processor, instead of working on separate smaller
minibatches, even though global batch size is identical. We also show how to
take advantage of the same phenomenon in serial execution by generating
dependent consecutive minibatches. Our experimental evaluations show up to 4x
bandwidth savings for fetching vertex embeddings, by simply increasing this
dependency without harming model convergence. Combining our proposed
approaches, we achieve up to 64% speedup over Independent Minibatching on
single-node multi-GPU systems.
Related papers
- Distributed Matrix-Based Sampling for Graph Neural Network Training [0.0]
We propose a matrix-based bulk sampling approach that expresses sampling as a sparse matrix multiplication (SpGEMM) and samples multiple minibatches at once.
When the input graph topology does not fit on a single device, our method distributes the graph and use communication-avoiding SpGEMM algorithms to scale GNN minibatch sampling.
In addition to new methods for sampling, we introduce a pipeline that uses our matrix-based bulk sampling approach to provide end-to-end training results.
arXiv Detail & Related papers (2023-11-06T06:40:43Z) - Efficient Heterogeneous Graph Learning via Random Projection [65.65132884606072]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism [6.3568605707961]
Mini-batch training is commonly used to train Graph neural networks (GNNs) on large graphs.
In this paper, we introduce a hybrid parallel mini-batch training paradigm called split parallelism.
We show that split parallelism outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and $P3$.
arXiv Detail & Related papers (2023-03-24T03:28:05Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Data Subsampling for Bayesian Neural Networks [0.0]
Penalty Bayesian Neural Networks - PBNNs - achieve good predictive performance for a given mini-batch size.
Varying the size of the mini-batches enables a natural calibration of the predictive distribution.
We expect PBNN to be particularly suited for cases when data sets are distributed across multiple decentralized devices.
arXiv Detail & Related papers (2022-10-17T14:43:35Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs.
It is still challenging to perform training in an efficient and scalable way.
We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Improve SGD Training via Aligning Mini-batches [22.58823484394866]
In-Training Distribution Matching (ITDM) is proposed to improve deep neural networks (DNNs) training and reduce overfitting.
Specifically, ITDM regularizes the feature extractor by matching the moments of distributions of different mini-batches in each iteration of SGD.
arXiv Detail & Related papers (2020-02-23T15:10:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.