Exphormer: Sparse Transformers for Graphs
- URL: http://arxiv.org/abs/2303.06147v2
- Date: Mon, 24 Jul 2023 17:58:45 GMT
- Title: Exphormer: Sparse Transformers for Graphs
- Authors: Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J.
Sutherland, Ali Kemal Sinop
- Abstract summary: We introduce Exphormer, a framework for building powerful and scalable graph transformers.
We show that Exphormer produces models with competitive empirical results on a wide variety of graph datasets.
- Score: 5.055213942955148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph transformers have emerged as a promising architecture for a variety of
graph learning and representation tasks. Despite their successes, though, it
remains challenging to scale graph transformers to large graphs while
maintaining accuracy competitive with message-passing networks. In this paper,
we introduce Exphormer, a framework for building powerful and scalable graph
transformers. Exphormer consists of a sparse attention mechanism based on two
mechanisms: virtual global nodes and expander graphs, whose mathematical
characteristics, such as spectral expansion, pseduorandomness, and sparsity,
yield graph transformers with complexity only linear in the size of the graph,
while allowing us to prove desirable theoretical properties of the resulting
transformer models. We show that incorporating Exphormer into the
recently-proposed GraphGPS framework produces models with competitive empirical
results on a wide variety of graph datasets, including state-of-the-art results
on three datasets. We also show that Exphormer can scale to datasets on larger
graphs than shown in previous graph transformer architectures. Code can be
found at \url{https://github.com/hamed1375/Exphormer}.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - Transformers as Graph-to-Graph Models [13.630495199720423]
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case.
Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions.
arXiv Detail & Related papers (2023-10-27T07:21:37Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Attending to Graph Transformers [5.609943831664869]
transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs.
Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field.
We probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing.
arXiv Detail & Related papers (2023-02-08T16:40:11Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Relphormer: Relational Graph Transformer for Knowledge Graph
Representations [25.40961076988176]
We propose a new variant of Transformer for knowledge graph representations dubbed Relphormer.
We propose a novel structure-enhanced self-attention mechanism to encode the relational information and keep the semantic information within entities and relations.
Experimental results on six datasets show that Relphormer can obtain better performance compared with baselines.
arXiv Detail & Related papers (2022-05-22T15:30:18Z) - Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks.
We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer.
Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.