Related papers: Hierarchical Transformer for Scalable Graph Learning

Hierarchical Transformer for Scalable Graph Learning

URL: http://arxiv.org/abs/2305.02866v2
Date: Fri, 5 May 2023 05:21:18 GMT
Title: Hierarchical Transformer for Scalable Graph Learning
Authors: Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
Abstract summary: Graph Transformer has demonstrated state-of-the-art performance on benchmarks for graph representation learning. The complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. We introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance.
Score: 22.462712609402324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.

Related papers

DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling [10.556366638048384]
Graph Transformer (GT) has recently emerged as a promising neural network architecture for learning graph-structured data. We propose DHIL-GT, a scalable Graph Transformer that simplifies network learning by fully decoupling the graph computation to a separate stage in advance. DHIL-GT is efficient in terms of computational boost and mini-batch capability over existing scalable Graph Transformer designs on large-scale benchmarks.
arXiv Detail & Related papers (2024-12-06T02:59:01Z)
Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper. The process involves generating intermediate prompts for each instance using a lightweight architecture. Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z)
Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning [47.74244053386216]
Graph condensation is a promising solution to synthesize a compact, substitute graph of the large-scale original graph. We introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised surrogate task to extract critical, causal information from the original graph. CTGC excels in handling various downstream tasks with a limited number of labels, consistently outperforming state-of-the-art GC methods.
arXiv Detail & Related papers (2024-11-26T03:01:22Z)
SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models. We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning. By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z)
Transforming Graphs for Enhanced Attribute Clustering: An Innovative Graph Transformer-Based Method [8.989218350080844]
This study introduces an innovative method known as the Graph Transformer Auto-Encoder for Graph Clustering (GTAGC) By melding the Graph Auto-Encoder with the Graph Transformer, GTAGC is adept at capturing global dependencies between nodes. The architecture of GTAGC encompasses graph embedding, integration of the Graph Transformer within the autoencoder structure, and a clustering component.
arXiv Detail & Related papers (2023-06-20T06:04:03Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Hierarchical Graph Transformer with Adaptive Node Sampling [19.45896788055167]
We identify the main deficiencies of current graph transformers. Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions.
arXiv Detail & Related papers (2022-10-08T05:53:25Z)
Dynamic Graph Representation Learning via Graph Transformer Networks [41.570839291138114]
We propose a Transformer-based dynamic graph learning method named Dynamic Graph Transformer (DGT) DGT has spatial-temporal encoding to effectively learn graph topology and capture implicit links. We show that DGT presents superior performance compared with several state-of-the-art baselines.
arXiv Detail & Related papers (2021-11-19T21:44:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.