Related papers: Exploring Sparsity in Graph Transformers

Exploring Sparsity in Graph Transformers

URL: http://arxiv.org/abs/2312.05479v1
Date: Sat, 9 Dec 2023 06:21:44 GMT
Title: Exploring Sparsity in Graph Transformers
Authors: Chuang Liu, Yibing Zhan, Xueqi Ma, Liang Ding, Dapeng Tao, Jia Wu, Wenbin Hu, Bo Du
Abstract summary: Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. We propose a comprehensive textbfGraph textbfTransformer textbfSParsification (GTSP) framework that helps to reduce the computational complexity of GTs.
Score: 67.48149404841925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. Therefore, in this paper, we explore the feasibility of sparsifying GTs, a significant yet under-explored topic. We first discuss the redundancy of GTs based on the characteristics of existing GT models, and then propose a comprehensive \textbf{G}raph \textbf{T}ransformer \textbf{SP}arsification (GTSP) framework that helps to reduce the computational complexity of GTs from four dimensions: the input graph data, attention heads, model layers, and model weights. Specifically, GTSP designs differentiable masks for each individual compressible component, enabling effective end-to-end pruning. We examine our GTSP through extensive experiments on prominent GTs, including GraphTrans, Graphormer, and GraphGPS. The experimental results substantiate that GTSP effectively cuts computational costs, accompanied by only marginal decreases in accuracy or, in some cases, even improvements. For instance, GTSP yields a reduction of 30\% in Floating Point Operations while contributing to a 1.8\% increase in Area Under the Curve accuracy on OGBG-HIV dataset. Furthermore, we provide several insights on the characteristics of attention heads and the behavior of attention mechanisms, all of which have immense potential to inspire future research endeavors in this domain.

Related papers

FairGP: A Scalable and Fair Graph Transformer Using Graph Partitioning [15.383535436798065]
Recent studies have highlighted significant fairness issues in Graph Transformer (GT) models. GTs are computationally intensive and memory-demanding, limiting their application to large-scale graphs. We propose Graph Partitioning (FairGP) which partitions the graph to minimize the negative impact of higher-order nodes.
arXiv Detail & Related papers (2024-12-14T04:15:32Z)
A Scalable and Effective Alternative to Graph Transformers [19.018320937729264]
Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to model pairwise node relationships. GTs suffer from complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. We present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs.
arXiv Detail & Related papers (2024-06-17T19:57:34Z)
Gradformer: Graph Transformer with Exponential Decay [69.50738015412189]
Self-attention mechanism in Graph Transformers (GTs) overlooks the graph's inductive biases, particularly biases related to structure. This paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias. Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks.
arXiv Detail & Related papers (2024-04-24T08:37:13Z)
SGHormer: An Energy-Saving Graph Transformer Driven by Spikes [32.30349324856102]
Graph Transformers (GTs) with powerful representation learning ability make a huge success in wide range of graph tasks. The costs behind outstanding performances of GTs are higher energy consumption and computational overhead. We propose a new spiking-based graph transformer (SGHormer) to reduce memory and computational costs.
arXiv Detail & Related papers (2024-03-26T12:39:02Z)
Key-Graph Transformer for Image Restoration [122.7334034968327]
We introduce the Key-Graph Transformer (KGT) in this paper. Specifically, KGT views patch features as graph nodes. The proposed Key-Graph Constructor efficiently forms a sparse yet representative Key-Graph by selectively connecting essential nodes instead of all the nodes.
arXiv Detail & Related papers (2024-02-04T23:00:24Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
HINormer: Representation Learning On Heterogeneous Information Networks with Graph Transformer [29.217820912610602]
Graph Transformers (GTs) have been proposed which work in the paradigm that allows message passing to a larger coverage even across the whole graph. The investigation of GTs on heterogeneous information networks (HINs) is still under-exploited. We propose a novel model named HINormer, which capitalizes on a larger-range aggregation mechanism for node representation learning.
arXiv Detail & Related papers (2023-02-22T12:25:07Z)
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.