Graph Self-Attention for learning graph representation with Transformer
- URL: http://arxiv.org/abs/2201.12787v1
- Date: Sun, 30 Jan 2022 11:10:06 GMT
- Title: Graph Self-Attention for learning graph representation with Transformer
- Authors: Wonpyo Park, Woonggi Chang, Donggeon Lee, Juntae Kim
- Abstract summary: We propose a novel Graph Self-Attention module to enable Transformer models to learn graph representation.
We propose context-aware attention which considers the interactions between query, key and graph information.
Our method achieves state-of-the-art performance on multiple benchmarks of graph representation learning.
- Score: 13.49645012479288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel Graph Self-Attention module to enable Transformer models
to learn graph representation. We aim to incorporate graph information, on the
attention map and hidden representations of Transformer. To this end, we
propose context-aware attention which considers the interactions between query,
key and graph information. Moreover, we propose graph-embedded value to encode
the graph information on the hidden representation. Our extensive experiments
and ablation studies validate that our method successfully encodes graph
representation on Transformer architecture. Finally, our method achieves
state-of-the-art performance on multiple benchmarks of graph representation
learning, such as graph classification on images and molecules to graph
regression on quantum chemistry.
Related papers
- Transformers as Graph-to-Graph Models [13.630495199720423]
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case.
Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions.
arXiv Detail & Related papers (2023-10-27T07:21:37Z) - Deep Prompt Tuning for Graph Transformers [55.2480439325792]
Fine-tuning is resource-intensive and requires storing multiple copies of large models.
We propose a novel approach called deep graph prompt tuning as an alternative to fine-tuning.
By freezing the pre-trained parameters and only updating the added tokens, our approach reduces the number of free parameters and eliminates the need for multiple model copies.
arXiv Detail & Related papers (2023-09-18T20:12:17Z) - Graph Propagation Transformer for Graph Representation Learning [36.01189696668657]
We propose a new attention mechanism called Graph Propagation Attention (GPA)
It explicitly passes the information among nodes and edges in three ways, i.e. node-to-node, node-to-edge, and edge-to-node.
We show that our method outperforms many state-of-the-art transformer-based graph models with better performance.
arXiv Detail & Related papers (2023-05-19T04:42:58Z) - Attending to Graph Transformers [5.609943831664869]
transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs.
Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field.
We probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing.
arXiv Detail & Related papers (2023-02-08T16:40:11Z) - Spectral Augmentations for Graph Contrastive Learning [50.149996923976836]
Contrastive learning has emerged as a premier method for learning representations with or without supervision.
Recent studies have shown its utility in graph representation learning for pre-training.
We propose a set of well-motivated graph transformation operations to provide a bank of candidates when constructing augmentations for a graph contrastive objective.
arXiv Detail & Related papers (2023-02-06T16:26:29Z) - Transformer for Graphs: An Overview from Architecture Perspective [86.3545861392215]
It's imperative to sort out the existing Transformer models for graphs and systematically investigate their effectiveness on various graph tasks.
We first disassemble the existing models and conclude three typical ways to incorporate the graph information into the vanilla Transformer.
Our experiments confirm the benefits of current graph-specific modules on Transformer and reveal their advantages on different kinds of graph tasks.
arXiv Detail & Related papers (2022-02-17T06:02:06Z) - Bootstrapping Informative Graph Augmentation via A Meta Learning
Approach [21.814940639910358]
In graph contrastive learning, benchmark methods apply various graph augmentation approaches.
Most of the augmentation methods are non-learnable, which causes the issue of generating unbeneficial augmented graphs.
We motivate our method to generate augmented graph by a learnable graph augmenter, called MEta Graph Augmentation (MEGA)
arXiv Detail & Related papers (2022-01-11T07:15:13Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z) - A Graph VAE and Graph Transformer Approach to Generating Molecular
Graphs [1.6631602844999724]
We propose a variational autoencoder and a transformer based model which fully utilise graph convolutional and graph pooling layers.
The transformer model implements a novel node encoding layer, replacing the position encoding typically used in transformers, to create a transformer with no position information that operates on graphs.
In experiments we chose a benchmark task of molecular generation, given the importance of both generated node and edge features.
arXiv Detail & Related papers (2021-04-09T13:13:06Z) - Dirichlet Graph Variational Autoencoder [65.94744123832338]
We present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors.
Motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships.
arXiv Detail & Related papers (2020-10-09T07:35:26Z) - GraphOpt: Learning Optimization Models of Graph Formation [72.75384705298303]
We propose an end-to-end framework that learns an implicit model of graph structure formation and discovers an underlying optimization mechanism.
The learned objective can serve as an explanation for the observed graph properties, thereby lending itself to transfer across different graphs within a domain.
GraphOpt poses link formation in graphs as a sequential decision-making process and solves it using maximum entropy inverse reinforcement learning algorithm.
arXiv Detail & Related papers (2020-07-07T16:51:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.