Hierarchical Transformer for Scalable Graph Learning
- URL: http://arxiv.org/abs/2305.02866v2
- Date: Fri, 5 May 2023 05:21:18 GMT
- Title: Hierarchical Transformer for Scalable Graph Learning
- Authors: Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
- Abstract summary: Graph Transformer has demonstrated state-of-the-art performance on benchmarks for graph representation learning.
The complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs.
We introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges.
HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance.
- Score: 22.462712609402324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Transformer is gaining increasing attention in the field of machine
learning and has demonstrated state-of-the-art performance on benchmarks for
graph representation learning. However, as current implementations of Graph
Transformer primarily focus on learning representations of small-scale graphs,
the quadratic complexity of the global self-attention mechanism presents a
challenge for full-batch training when applied to larger graphs. Additionally,
conventional sampling-based methods fail to capture necessary high-level
contextual information, resulting in a significant loss of performance. In this
paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a
solution to these challenges. HSGT successfully scales the Transformer
architecture to node representation learning tasks on large-scale graphs, while
maintaining high performance. By utilizing graph hierarchies constructed
through coarsening techniques, HSGT efficiently updates and stores multi-scale
information in node embeddings at different levels. Together with
sampling-based training methods, HSGT effectively captures and aggregates
multi-level information on the hierarchical graph using only Transformer
blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art
performance on large-scale benchmarks with graphs containing millions of nodes
with high efficiency.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - Transforming Graphs for Enhanced Attribute Clustering: An Innovative
Graph Transformer-Based Method [8.989218350080844]
This study introduces an innovative method known as the Graph Transformer Auto-Encoder for Graph Clustering (GTAGC)
By melding the Graph Auto-Encoder with the Graph Transformer, GTAGC is adept at capturing global dependencies between nodes.
The architecture of GTAGC encompasses graph embedding, integration of the Graph Transformer within the autoencoder structure, and a clustering component.
arXiv Detail & Related papers (2023-06-20T06:04:03Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Hierarchical Graph Transformer with Adaptive Node Sampling [19.45896788055167]
We identify the main deficiencies of current graph transformers.
Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph.
We propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions.
arXiv Detail & Related papers (2022-10-08T05:53:25Z) - Dynamic Graph Representation Learning via Graph Transformer Networks [41.570839291138114]
We propose a Transformer-based dynamic graph learning method named Dynamic Graph Transformer (DGT)
DGT has spatial-temporal encoding to effectively learn graph topology and capture implicit links.
We show that DGT presents superior performance compared with several state-of-the-art baselines.
arXiv Detail & Related papers (2021-11-19T21:44:23Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.