Related papers: Gradformer: Graph Transformer with Exponential Decay

Gradformer: Graph Transformer with Exponential Decay

URL: http://arxiv.org/abs/2404.15729v1
Date: Wed, 24 Apr 2024 08:37:13 GMT
Title: Gradformer: Graph Transformer with Exponential Decay
Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu,
Abstract summary: Self-attention mechanism in Graph Transformers (GTs) overlooks the graph's inductive biases, particularly biases related to structure. This paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias. Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks.
Score: 69.50738015412189
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models.Codes are available at \url{https://github.com/LiuChuang0059/Gradformer}.

Related papers

DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling [10.556366638048384]
Graph Transformer (GT) has recently emerged as a promising neural network architecture for learning graph-structured data. We propose DHIL-GT, a scalable Graph Transformer that simplifies network learning by fully decoupling the graph computation to a separate stage in advance. DHIL-GT is efficient in terms of computational boost and mini-batch capability over existing scalable Graph Transformer designs on large-scale benchmarks.
arXiv Detail & Related papers (2024-12-06T02:59:01Z)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
Variational Graph Auto-Encoder Based Inductive Learning Method for Semi-Supervised Classification [10.497590357666114]
We propose the Self-Label Augmented VGAE model for inductive graph representation learning. To leverage the label information for training, our model takes node labels as one-hot encoded inputs and then performs label reconstruction in model training. Our proposed model archives promise results on node classification with particular superiority under semi-supervised learning settings.
arXiv Detail & Related papers (2024-03-26T08:59:37Z)
Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification [71.36575018271405]
We introduce the Dual-Prism (DP) augmentation method, comprising DP-Noise and DP-Mask. We find that keeping the low-frequency eigenvalues unchanged can preserve the critical properties at a large scale when generating augmented graphs.
arXiv Detail & Related papers (2024-01-18T12:58:53Z)
Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation [41.00398052556643]
We propose a novel Adversarial Knowledge Distillation framework for graph models named GraphAKD. The discriminator distinguishes between teacher knowledge and what the student inherits, while the student GNN works as a generator and aims to fool the discriminator. The results imply that GraphAKD can precisely transfer knowledge from a complicated teacher GNN to a compact student GNN.
arXiv Detail & Related papers (2022-05-24T00:04:43Z)
Learning Graph Structure from Convolutional Mixtures [119.45320143101381]
We propose a graph convolutional relationship between the observed and latent graphs, and formulate the graph learning task as a network inverse (deconvolution) problem. In lieu of eigendecomposition-based spectral methods, we unroll and truncate proximal gradient iterations to arrive at a parameterized neural network architecture that we call a Graph Deconvolution Network (GDN) GDNs can learn a distribution of graphs in a supervised fashion, perform link prediction or edge-weight regression tasks by adapting the loss function, and they are inherently inductive.
arXiv Detail & Related papers (2022-05-19T14:08:15Z)
A Graph Data Augmentation Strategy with Entropy Preserving [11.886325179121226]
We introduce a novel graph entropy definition as a quantitative index to evaluate feature information among a graph. Under considerations of preserving graph entropy, we propose an effective strategy to generate training data using a perturbed mechanism. Our proposed approach significantly enhances the robustness and generalization ability of GCNs during the training process.
arXiv Detail & Related papers (2021-07-13T12:58:32Z)
A Deep Latent Space Model for Graph Representation Learning [10.914558012458425]
We propose a Deep Latent Space Model (DLSM) for directed graphs to incorporate the traditional latent variable based generative model into deep learning frameworks. Our proposed model consists of a graph convolutional network (GCN) encoder and a decoder, which are layer-wise connected by a hierarchical variational auto-encoder architecture. Experiments on real-world datasets show that the proposed model achieves the state-of-the-art performances on both link prediction and community detection tasks.
arXiv Detail & Related papers (2021-06-22T12:41:19Z)
Training Robust Graph Neural Networks with Topology Adaptive Edge Dropping [116.26579152942162]
Graph neural networks (GNNs) are processing architectures that exploit graph structural information to model representations from network data. Despite their success, GNNs suffer from sub-optimal generalization performance given limited training data. This paper proposes Topology Adaptive Edge Dropping to improve generalization performance and learn robust GNN models.
arXiv Detail & Related papers (2021-06-05T13:20:36Z)
GraphMI: Extracting Private Graph Data from Graph Neural Networks [59.05178231559796]
We present textbfGraph textbfModel textbfInversion attack (GraphMI), which aims to extract private graph data of the training graph by inverting GNN. Specifically, we propose a projected gradient module to tackle the discreteness of graph edges while preserving the sparsity and smoothness of graph features. We design a graph auto-encoder module to efficiently exploit graph topology, node attributes, and target model parameters for edge inference.
arXiv Detail & Related papers (2021-06-05T07:07:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.