Boosting Graph Embedding on a Single GPU
- URL: http://arxiv.org/abs/2110.10049v1
- Date: Tue, 19 Oct 2021 15:25:04 GMT
- Title: Boosting Graph Embedding on a Single GPU
- Authors: Amro Alabsi Aljundi, Taha Atahan Aky{\i}ld{\i}z, Kamer Kaya
- Abstract summary: We present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints.
It employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding.
It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU.
- Score: 3.093890460224435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graphs are ubiquitous, and they can model unique characteristics and complex
relations of real-life systems. Although using machine learning (ML) on graphs
is promising, their raw representation is not suitable for ML algorithms. Graph
embedding represents each node of a graph as a d-dimensional vector which is
more suitable for ML tasks. However, the embedding process is expensive, and
CPU-based tools do not scale to real-world graphs. In this work, we present
GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware
constraints. GOSH employs a novel graph coarsening algorithm to enhance the
impact of updates and minimize the work for embedding. It also incorporates a
decomposition schema that enables any arbitrarily large graph to be embedded
with a single GPU. As a result, GOSH sets a new state-of-the-art in link
prediction both in accuracy and speed, and delivers high-quality embeddings for
node classification at a fraction of the time compared to the state-of-the-art.
For instance, it can embed a graph with over 65 million vertices and 1.8
billion edges in less than 30 minutes on a single GPU.
Related papers
- GraphStorm: all-in-one graph machine learning framework for industry applications [75.23076561638348]
GraphStorm is an end-to-end solution for scalable graph construction, graph model training and inference.
Every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code.
GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023.
arXiv Detail & Related papers (2024-06-10T04:56:16Z) - Learning on Large Graphs using Intersecting Communities [13.053266613831447]
MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors.
MPNNs might quickly become prohibitive for large graphs provided they are not very sparse.
We propose approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques.
arXiv Detail & Related papers (2024-05-31T09:26:26Z) - MGNet: Learning Correspondences via Multiple Graphs [78.0117352211091]
Learning correspondences aims to find correct correspondences from the initial correspondence set with an uneven correspondence distribution and a low inlier rate.
Recent advances usually use graph neural networks (GNNs) to build a single type of graph or stack local graphs into the global one to complete the task.
We propose MGNet to effectively combine multiple complementary graphs.
arXiv Detail & Related papers (2024-01-10T07:58:44Z) - Distributed Graph Embedding with Information-Oriented Random Walks [16.290803469068145]
Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks.
We present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs.
D DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.
arXiv Detail & Related papers (2023-03-28T03:11:21Z) - DOTIN: Dropping Task-Irrelevant Nodes for GNNs [119.17997089267124]
Recent graph learning approaches have introduced the pooling strategy to reduce the size of graphs for learning.
We design a new approach called DOTIN (underlineDrunderlineopping underlineTask-underlineIrrelevant underlineNodes) to reduce the size of graphs.
Our method speeds up GAT by about 50% on graph-level tasks including graph classification and graph edit distance.
arXiv Detail & Related papers (2022-04-28T12:00:39Z) - Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph.
In this work, we experiment with the use of graph summarization techniques to compress the graph.
We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z) - Dirichlet Graph Variational Autoencoder [65.94744123832338]
We present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors.
Motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships.
arXiv Detail & Related papers (2020-10-09T07:35:26Z) - Force2Vec: Parallel force-directed graph embedding [3.577310844634503]
A graph embedding algorithm embeds a graph into a low-dimensional space such that the embedding preserves the inherent properties of the graph.
We develop Force2Vec that uses force-directed graph layout models in a graph embedding setting with an aim to excel in both machine learning (ML) and visualization tasks.
arXiv Detail & Related papers (2020-09-17T03:53:25Z) - FeatGraph: A Flexible and Efficient Backend for Graph Neural Network
Systems [23.258185277825888]
FeatGraph provides a flexible programming interface to express diverse GNN models.
FeatGraph speeds up end-to-end GNN training and inference by up to 32x on CPU and 7x on GPU.
arXiv Detail & Related papers (2020-08-26T03:17:05Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - Wasserstein Embedding for Graph Learning [33.90471037116372]
Wasserstein Embedding for Graph Learning (WEGL) is a framework for embedding entire graphs in a vector space.
We leverage new insights on defining similarity between graphs as a function of the similarity between their node embedding distributions.
We evaluate our new graph embedding approach on various benchmark graph-property prediction tasks.
arXiv Detail & Related papers (2020-06-16T18:23:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.