Graph Transformers for Large Graphs
- URL: http://arxiv.org/abs/2312.11109v1
- Date: Mon, 18 Dec 2023 11:19:23 GMT
- Title: Graph Transformers for Large Graphs
- Authors: Vijay Prakash Dwivedi, Yozen Liu, Anh Tuan Luu, Xavier Bresson, Neil
Shah, Tong Zhao
- Abstract summary: This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
- Score: 57.19338459218758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have recently emerged as powerful neural networks for graph
learning, showcasing state-of-the-art performance on several graph property
prediction tasks. However, these results have been limited to small-scale
graphs, where the computational feasibility of the global attention mechanism
is possible. The next goal is to scale up these architectures to handle very
large graphs on the scale of millions or even billions of nodes. With
large-scale graphs, global attention learning is proven impractical due to its
quadratic complexity w.r.t. the number of nodes. On the other hand,
neighborhood sampling techniques become essential to manage large graph sizes,
yet finding the optimal trade-off between speed and accuracy with sampling
techniques remains challenging. This work advances representation learning on
single large-scale graphs with a focus on identifying model characteristics and
critical design constraints for developing scalable graph transformer (GT)
architectures. We argue such GT requires layers that can adeptly learn both
local and global graph representations while swiftly sampling the graph
topology. As such, a key innovation of this work lies in the creation of a fast
neighborhood sampling technique coupled with a local attention mechanism that
encompasses a 4-hop reception field, but achieved through just 2-hop
operations. This local node embedding is then integrated with a global node
embedding, acquired via another self-attention layer with an approximate global
codebook, before finally sent through a downstream layer for node predictions.
The proposed GT framework, named LargeGT, overcomes previous computational
bottlenecks and is validated on three large-scale node classification
benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products
and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9%
performance improvement.
Related papers
- A Scalable and Effective Alternative to Graph Transformers [19.018320937729264]
Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to model pairwise node relationships.
GTs suffer from complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs.
We present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs.
arXiv Detail & Related papers (2024-06-17T19:57:34Z) - Spectral Greedy Coresets for Graph Neural Networks [61.24300262316091]
The ubiquity of large-scale graphs in node-classification tasks hinders the real-world applications of Graph Neural Networks (GNNs)
This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs based on their spectral embeddings.
Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs.
arXiv Detail & Related papers (2024-05-27T17:52:12Z) - GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural
Properties of Graphs [5.410321469222541]
We propose GLISP, a sampling based GNN learning system for industrial scale graphs.
GLISP consists of three core components: graph partitioner, graph sampling service and graph inference engine.
Experiments show that GLISP achieves up to $6.53times$ and $70.77times$ speedups over existing GNN systems for training and inference tasks.
arXiv Detail & Related papers (2024-01-06T02:59:24Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - $\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks [18.772128348519566]
We propose the Aggregation-Aware mixed-precision Quantization ($rm A2Q$) for Graph Neural Networks (GNNs)
Our method can achieve up to 11.4% and 9.5% accuracy improvements on the node-level and graph-level tasks, respectively, and up to 2x speedup on a dedicated hardware accelerator.
arXiv Detail & Related papers (2023-02-01T02:54:35Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - Isometric Graph Neural Networks [5.306334746787569]
We propose a technique to learn Isometric Graph Neural Networks (IGNN)
IGNN requires changing the input representation space and loss function to enable any GNN algorithm to generate representations that reflect distances between nodes.
We observe a consistent and substantial improvement as high as 400% in Kendall's Tau (KT)
arXiv Detail & Related papers (2020-06-16T22:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.