Leiden-Fusion Partitioning Method for Effective Distributed Training of Graph Embeddings
- URL: http://arxiv.org/abs/2409.09887v1
- Date: Sun, 15 Sep 2024 22:50:57 GMT
- Title: Leiden-Fusion Partitioning Method for Effective Distributed Training of Graph Embeddings
- Authors: Yuhe Bai, Camelia Constantin, Hubert Naacke,
- Abstract summary: We introduce a novel partitioning method, named Leiden-Fusion, designed for large-scale training of graphs with minimal communication.
Our method guarantees that, for an initially connected graph, each partition is a densely connected subgraph with no isolated nodes.
We demonstrate the effectiveness of our method through extensive evaluations on several benchmark datasets.
- Score: 0.7373617024876725
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In the area of large-scale training of graph embeddings, effective training frameworks and partitioning methods are critical for handling large networks. However, they face two major challenges: 1) existing synchronized distributed frameworks require continuous communication to access information from other machines, and 2) the inability of current partitioning methods to ensure that subgraphs remain connected components without isolated nodes, which is essential for effective training of GNNs since training relies on information aggregation from neighboring nodes. To address these issues, we introduce a novel partitioning method, named Leiden-Fusion, designed for large-scale training of graphs with minimal communication. Our method extends the Leiden community detection algorithm with a greedy algorithm that merges the smallest communities with highly connected neighboring communities. Our method guarantees that, for an initially connected graph, each partition is a densely connected subgraph with no isolated nodes. After obtaining the partitions, we train a GNN for each partition independently, and finally integrate all embeddings for node classification tasks, which significantly reduces the need for network communication and enhances the efficiency of distributed graph training. We demonstrate the effectiveness of our method through extensive evaluations on several benchmark datasets, achieving high efficiency while preserving the quality of the graph embeddings for node classification tasks.
Related papers
- MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - Federated Hypergraph Learning: Hyperedge Completion with Local Differential Privacy [6.295242666794106]
FedHGL is designed to collaboratively train a comprehensive hypergraph neural network across multiple clients.
Cross-client feature aggregation is performed and distributed at the central server to ensure that this information can be utilized by the clients.
arXiv Detail & Related papers (2024-08-09T16:31:41Z) - Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Federated K-Means Clustering via Dual Decomposition-based Distributed
Optimization [0.0]
This paper shows how dual decomposition can be applied for distributed training of $ K $-means clustering problems.
The training can be performed in a distributed manner by splitting the data across different nodes and linking these nodes through consensus constraints.
arXiv Detail & Related papers (2023-07-25T05:34:50Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Distributed Learning over Networks with Graph-Attention-Based
Personalization [49.90052709285814]
We propose a graph-based personalized algorithm (GATTA) for distributed deep learning.
In particular, the personalized model in each agent is composed of a global part and a node-specific part.
By treating each agent as one node in a graph the node-specific parameters as its features, the benefits of the graph attention mechanism can be inherited.
arXiv Detail & Related papers (2023-05-22T13:48:30Z) - ABC: Aggregation before Communication, a Communication Reduction
Framework for Distributed Graph Neural Network Training and Effective
Partition [0.0]
Graph Neural Networks (GNNs) are neural models tailored for graph-structure data and have shown superior performance in learning representations for graph-structured data.
In this paper, we study the communication complexity during distributed GNNs training.
We show that the new partition paradigm is particularly ideal in the case of dynamic graphs where it is infeasible to control the edge placement due to the unknown of the graph-changing process.
arXiv Detail & Related papers (2022-12-11T04:54:01Z) - Uniting Heterogeneity, Inductiveness, and Efficiency for Graph
Representation Learning [68.97378785686723]
graph neural networks (GNNs) have greatly advanced the performance of node representation learning on graphs.
A majority class of GNNs are only designed for homogeneous graphs, leading to inferior adaptivity to the more informative heterogeneous graphs.
We propose a novel inductive, meta path-free message passing scheme that packs up heterogeneous node features with their associated edges from both low- and high-order neighbor nodes.
arXiv Detail & Related papers (2021-04-04T23:31:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.