Related papers: SGHormer: An Energy-Saving Graph Transformer Driven by Spikes

SGHormer: An Energy-Saving Graph Transformer Driven by Spikes

URL: http://arxiv.org/abs/2403.17656v1
Date: Tue, 26 Mar 2024 12:39:02 GMT
Title: SGHormer: An Energy-Saving Graph Transformer Driven by Spikes
Authors: Huizhe Zhang, Jintang Li, Liang Chen, Zibin Zheng,
Abstract summary: Graph Transformers (GTs) with powerful representation learning ability make a huge success in wide range of graph tasks. The costs behind outstanding performances of GTs are higher energy consumption and computational overhead. We propose a new spiking-based graph transformer (SGHormer) to reduce memory and computational costs.
Score: 32.30349324856102
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Transformers (GTs) with powerful representation learning ability make a huge success in wide range of graph tasks. However, the costs behind outstanding performances of GTs are higher energy consumption and computational overhead. The complex structure and quadratic complexity during attention calculation in vanilla transformer seriously hinder its scalability on the large-scale graph data. Though existing methods have made strides in simplifying combinations among blocks or attention-learning paradigm to improve GTs' efficiency, a series of energy-saving solutions originated from biologically plausible structures are rarely taken into consideration when constructing GT framework. To this end, we propose a new spiking-based graph transformer (SGHormer). It turns full-precision embeddings into sparse and binarized spikes to reduce memory and computational costs. The spiking graph self-attention and spiking rectify blocks in SGHormer explicitly capture global structure information and recover the expressive power of spiking embeddings, respectively. In experiments, SGHormer achieves comparable performances to other full-precision GTs with extremely low computational energy consumption. The results show that SGHomer makes a remarkable progress in the field of low-energy GTs.

Related papers

DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling [10.556366638048384]
Graph Transformer (GT) has recently emerged as a promising neural network architecture for learning graph-structured data. We propose DHIL-GT, a scalable Graph Transformer that simplifies network learning by fully decoupling the graph computation to a separate stage in advance. DHIL-GT is efficient in terms of computational boost and mini-batch capability over existing scalable Graph Transformer designs on large-scale benchmarks.
arXiv Detail & Related papers (2024-12-06T02:59:01Z)
SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
A Scalable and Effective Alternative to Graph Transformers [19.018320937729264]
Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to model pairwise node relationships. GTs suffer from complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. We present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs.
arXiv Detail & Related papers (2024-06-17T19:57:34Z)
Masked Graph Transformer for Large-Scale Recommendation [56.37903431721977]
We propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. Experimental results show the superior performance of our MGFormer, even with a single attention layer.
arXiv Detail & Related papers (2024-05-07T06:00:47Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
Exploring Sparsity in Graph Transformers [67.48149404841925]
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. We propose a comprehensive textbfGraph textbfTransformer textbfSParsification (GTSP) framework that helps to reduce the computational complexity of GTs.
arXiv Detail & Related papers (2023-12-09T06:21:44Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Hierarchical Transformer for Scalable Graph Learning [22.462712609402324]
Graph Transformer has demonstrated state-of-the-art performance on benchmarks for graph representation learning. The complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. We introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance.
arXiv Detail & Related papers (2023-05-04T14:23:22Z)
What Dense Graph Do You Need for Self-Attention? [73.82686008622596]
We present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.
arXiv Detail & Related papers (2022-05-27T14:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.