Efficient Graph Encoder Embedding for Large Sparse Graphs in Python
- URL: http://arxiv.org/abs/2406.03726v1
- Date: Thu, 6 Jun 2024 03:49:34 GMT
- Title: Efficient Graph Encoder Embedding for Large Sparse Graphs in Python
- Authors: Xihan Qin, Cencheng Shen,
- Abstract summary: Graph Embedding (GEE) has been shown as the fastest graph embedding technique and is suitable for a variety of network data applications.
We propose an improved version of GEE, sparse GEE, which optimize the calculation and storage of zero entries in sparse matrices to enhance the running time further.
Our experiments demonstrate that the sparse version achieves significant speedup compared to the original GEE with Python implementation for large sparse graphs, and sparse GEE is capable of processing millions of edges within minutes on a standard laptop.
- Score: 3.5374094795720854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph is a ubiquitous representation of data in various research fields, and graph embedding is a prevalent machine learning technique for capturing key features and generating fixed-sized attributes. However, most state-of-the-art graph embedding methods are computationally and spatially expensive. Recently, the Graph Encoder Embedding (GEE) has been shown as the fastest graph embedding technique and is suitable for a variety of network data applications. As real-world data often involves large and sparse graphs, the huge sparsity usually results in redundant computations and storage. To address this issue, we propose an improved version of GEE, sparse GEE, which optimizes the calculation and storage of zero entries in sparse matrices to enhance the running time further. Our experiments demonstrate that the sparse version achieves significant speedup compared to the original GEE with Python implementation for large sparse graphs, and sparse GEE is capable of processing millions of edges within minutes on a standard laptop.
Related papers
- Learning on Large Graphs using Intersecting Communities [13.053266613831447]
MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors.
MPNNs might quickly become prohibitive for large graphs provided they are not very sparse.
We propose approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques.
arXiv Detail & Related papers (2024-05-31T09:26:26Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Distributed Graph Embedding with Information-Oriented Random Walks [16.290803469068145]
Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks.
We present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs.
D DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.
arXiv Detail & Related papers (2023-03-28T03:11:21Z) - Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph.
In this work, we experiment with the use of graph summarization techniques to compress the graph.
We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z) - Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors
to Sequences [55.329402218608365]
We propose the Neighbor2Seq to transform the hierarchical neighborhood of each node into a sequence.
We evaluate our method on a massive graph with more than 111 million nodes and 1.6 billion edges.
Results show that our proposed method is scalable to massive graphs and achieves superior performance across massive and medium-scale graphs.
arXiv Detail & Related papers (2022-02-07T16:38:36Z) - Boosting Graph Embedding on a Single GPU [3.093890460224435]
We present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints.
It employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding.
It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU.
arXiv Detail & Related papers (2021-10-19T15:25:04Z) - Fast Attributed Graph Embedding via Density of States [17.360163137925994]
We propose A-DOGE, for Attributed DOS-based Graph Embedding.
A-DOGE is designed to fulfill a long desiderata of desirable characteristics.
Being based on the entire eigenspectrum of a graph, A-DOGE can capture structural and attribute properties at multiple "glocal" scales.
arXiv Detail & Related papers (2021-10-11T12:44:58Z) - GNNAutoScale: Scalable and Expressive Graph Neural Networks via
Historical Embeddings [51.82434518719011]
GNNAutoScale (GAS) is a framework for scaling arbitrary message-passing GNNs to large graphs.
Gas prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations.
Gas reaches state-of-the-art performance on large-scale graphs.
arXiv Detail & Related papers (2021-06-10T09:26:56Z) - Pseudoinverse Graph Convolutional Networks: Fast Filters Tailored for
Large Eigengaps of Dense Graphs and Hypergraphs [0.0]
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised classification on graph-based datasets.
We propose a new GCN variant whose three-part filter space is targeted at dense graphs.
arXiv Detail & Related papers (2020-08-03T08:48:41Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.