SGHormer: An Energy-Saving Graph Transformer Driven by Spikes
- URL: http://arxiv.org/abs/2403.17656v1
- Date: Tue, 26 Mar 2024 12:39:02 GMT
- Title: SGHormer: An Energy-Saving Graph Transformer Driven by Spikes
- Authors: Huizhe Zhang, Jintang Li, Liang Chen, Zibin Zheng,
- Abstract summary: Graph Transformers (GTs) with powerful representation learning ability make a huge success in wide range of graph tasks.
The costs behind outstanding performances of GTs are higher energy consumption and computational overhead.
We propose a new spiking-based graph transformer (SGHormer) to reduce memory and computational costs.
- Score: 32.30349324856102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Transformers (GTs) with powerful representation learning ability make a huge success in wide range of graph tasks. However, the costs behind outstanding performances of GTs are higher energy consumption and computational overhead. The complex structure and quadratic complexity during attention calculation in vanilla transformer seriously hinder its scalability on the large-scale graph data. Though existing methods have made strides in simplifying combinations among blocks or attention-learning paradigm to improve GTs' efficiency, a series of energy-saving solutions originated from biologically plausible structures are rarely taken into consideration when constructing GT framework. To this end, we propose a new spiking-based graph transformer (SGHormer). It turns full-precision embeddings into sparse and binarized spikes to reduce memory and computational costs. The spiking graph self-attention and spiking rectify blocks in SGHormer explicitly capture global structure information and recover the expressive power of spiking embeddings, respectively. In experiments, SGHormer achieves comparable performances to other full-precision GTs with extremely low computational energy consumption. The results show that SGHomer makes a remarkable progress in the field of low-energy GTs.
Related papers
- SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs.
We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning.
It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z) - A Scalable and Effective Alternative to Graph Transformers [19.018320937729264]
Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to model pairwise node relationships.
GTs suffer from complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs.
We present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs.
arXiv Detail & Related papers (2024-06-17T19:57:34Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Exploring Sparsity in Graph Transformers [67.48149404841925]
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.
However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments.
We propose a comprehensive textbfGraph textbfTransformer textbfSParsification (GTSP) framework that helps to reduce the computational complexity of GTs.
arXiv Detail & Related papers (2023-12-09T06:21:44Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Hierarchical Transformer for Scalable Graph Learning [22.462712609402324]
Graph Transformer has demonstrated state-of-the-art performance on benchmarks for graph representation learning.
The complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs.
We introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges.
HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance.
arXiv Detail & Related papers (2023-05-04T14:23:22Z) - What Dense Graph Do You Need for Self-Attention? [73.82686008622596]
We present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer.
Experiments on tasks requiring various sequence lengths lay validation for our graph function well.
arXiv Detail & Related papers (2022-05-27T14:36:55Z) - Do Transformers Really Perform Bad for Graph Representation? [62.68420868623308]
We present Graphormer, which is built upon the standard Transformer architecture.
Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.
arXiv Detail & Related papers (2021-06-09T17:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.