VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections
- URL: http://arxiv.org/abs/2403.16030v1
- Date: Sun, 24 Mar 2024 06:10:56 GMT
- Title: VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections
- Authors: Dongqi Fu, Zhigang Hua, Yan Xie, Jin Fang, Si Zhang, Kaan Sancak, Hao Wu, Andrey Malevich, Jingrui He, Bo Long,
- Abstract summary: We show that mini-batch training of graph transformers is possible by loading each node's token list in batches.
We further prove this PPR tokenization is viable as a graph convolution network with a fixed filter and jumping knowledge.
- Score: 45.27160435758666
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR tokenization method decouples model training from complex graph topological information and makes heavy feature engineering offline and independent, such that mini-batch training of graph transformers is possible by loading each node's token list in batches. We further prove this PPR tokenization is viable as a graph convolution network with a fixed polynomial filter and jumping knowledge. However, only using personalized PageRank may limit information carried by a token list, which could not support different graph inductive biases for model training. To this end, (2) we rewire graphs by introducing multiple types of virtual connections through structure- and content-based super nodes that enable PPR tokenization to encode local and global contexts, long-range interaction, and heterophilous information into each node's token list, and then formalize our Virtual Connection Ranking based Graph Transformer (VCR-Graphormer).
Related papers
- NTFormer: A Composite Node Tokenized Graph Transformer for Node Classification [11.451341325579188]
We propose a new graph Transformer called NTFormer to address node classification issues.
New token generator called Node2Par generates various token sequences using different token elements for each node.
Experiments conducted on various benchmark datasets demonstrate the superiority of NTFormer over representative graph Transformers and graph neural networks for node classification.
arXiv Detail & Related papers (2024-06-27T15:16:00Z) - Learning on Large Graphs using Intersecting Communities [13.053266613831447]
MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors.
MPNNs might quickly become prohibitive for large graphs provided they are not very sparse.
We propose approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques.
arXiv Detail & Related papers (2024-05-31T09:26:26Z) - Graph as Point Set [31.448841287258116]
This paper introduces a novel graph-to-set conversion method that transforms interconnected nodes into a set of independent points.
It enables using set encoders to learn from graphs, thereby significantly expanding the design space of Graph Neural Networks.
To demonstrate the effectiveness of our approach, we introduce Point Set Transformer (PST), a transformer architecture that accepts a point set converted from a graph as input.
arXiv Detail & Related papers (2024-05-05T02:29:41Z) - SignGT: Signed Attention-based Graph Transformer for Graph
Representation Learning [15.248591535696146]
We propose a Signed Attention-based Graph Transformer (SignGT) to adaptively capture various frequency information from the graphs.
Specifically, SignGT develops a new signed self-attention mechanism (SignSA) that produces signed attention values according to the semantic relevance of node pairs.
arXiv Detail & Related papers (2023-10-17T06:42:11Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Graph Neural Networks with Learnable Structural and Positional
Representations [83.24058411666483]
A major issue with arbitrary graphs is the absence of canonical positional information of nodes.
We introduce Positional nodes (PE) of nodes, and inject it into the input layer, like in Transformers.
We observe a performance increase for molecular datasets, from 2.87% up to 64.14% when considering learnable PE for both GNN classes.
arXiv Detail & Related papers (2021-10-15T05:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.