Learning Graph Quantized Tokenizers for Transformers
- URL: http://arxiv.org/abs/2410.13798v1
- Date: Thu, 17 Oct 2024 17:38:24 GMT
- Title: Learning Graph Quantized Tokenizers for Transformers
- Authors: Limei Wang, Kaveh Hassani, Si Zhang, Dongqi Fu, Baichuan Yuan, Weilin Cong, Zhigang Hua, Hao Wu, Ning Yao, Bo Long,
- Abstract summary: Graph Transformers (GTs) have emerged as a leading model in deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks.
We introduce GQT (textbfGraph textbfQuantized textbfTokenizer), which decouples tokenizer training from Transformer training by leveraging graph self-supervised learning.
By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets.
- Score: 28.79505338383552
- License:
- Abstract: Transformers serve as the backbone architectures of Foundational Models, where a domain-specific tokenizer helps them adapt to various domains. Graph Transformers (GTs) have recently emerged as a leading model in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities, with existing approaches relying on heuristics or GNNs co-trained with Transformers. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 16 out of 18 benchmarks, including large-scale homophilic and heterophilic datasets. The code is available at: https://github.com/limei0307/graph-tokenizer
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - SpikeGraphormer: A High-Performance Graph Transformer with Spiking Graph Attention [1.4126245676224705]
Graph Transformers have emerged as a promising solution to alleviate the inherent limitations of Graph Neural Networks (GNNs)
We propose a novel insight into integrating SNNs with Graph Transformers and design a Spiking Graph Attention (SGA) module.
SpikeGraphormer consistently outperforms existing state-of-the-art approaches across various datasets.
arXiv Detail & Related papers (2024-03-21T03:11:53Z) - Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph.
We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z) - GraphGPT: Graph Learning with Generative Pre-trained Transformers [9.862004020075126]
We introduce textitGraphGPT, a novel model for Graph learning by self-supervised Generative Pre-training Transformers.
Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes.
The generative pre-training enables us to train GraphGPT up to 400M+ parameters with consistently increasing performance.
arXiv Detail & Related papers (2023-12-31T16:19:30Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Representing Long-Range Context for Graph Neural Networks with Global
Attention [37.212747564546156]
We propose the use of Transformer-based self-attention to learn long-range pairwise relationships.
Our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module.
Our results suggest that purely-learning-based approaches without graph structure may be suitable for learning high-level, long-range relationships on graphs.
arXiv Detail & Related papers (2022-01-21T18:16:21Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.