Related papers: Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning

Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning

URL: http://arxiv.org/abs/2309.04082v1
Date: Fri, 8 Sep 2023 02:44:37 GMT
Title: Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning
Authors: Sungjun Cho, Seunghyuk Cho, Sungwoo Park, Hankook Lee, Honglak Lee, Moontae Lee
Abstract summary: We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
Score: 77.1421343649344
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world graphs naturally exhibit hierarchical or cyclical structures that are unfit for the typical Euclidean space. While there exist graph neural networks that leverage hyperbolic or spherical spaces to learn representations that embed such structures more accurately, these methods are confined under the message-passing paradigm, making the models vulnerable against side-effects such as oversmoothing and oversquashing. More recent work have proposed global attention-based graph Transformers that can easily model long-range interactions, but their extensions towards non-Euclidean geometry are yet unexplored. To bridge this gap, we propose Fully Product-Stereographic Transformer, a generalization of Transformers towards operating entirely on the product of constant curvature spaces. When combined with tokenized graph Transformers, our model can learn the curvature appropriate for the input graph in an end-to-end fashion, without the need of additional tuning on different curvature initializations. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges while respecting the underlying geometry. Experiments on graph reconstruction and node classification demonstrate the benefits of generalizing Transformers to the non-Euclidean domain.

Related papers

Adaptive Riemannian Graph Neural Networks [29.859977834688625]
We introduce a novel framework that learns a continuous and anisotropic metric tensor field over the graph.<n>It allows each node to determine its optimal local geometry, enabling the model to fluidly adapt to the graph's structural landscape.<n>Our method demonstrates superior performance on both homophilic and heterophilic benchmark geometries.
arXiv Detail & Related papers (2025-08-04T16:55:02Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection [23.643189106137008]
We propose CurvGAD - a mixed-curvature graph autoencoder that introduces the notion of curvature-based geometric anomalies.<n>CurvGAD introduces two parallel pipelines for enhanced anomaly interpretability.<n>Experiments over 10 real-world datasets demonstrate an improvement of up to 6.5% over state-of-the-art GAD methods.
arXiv Detail & Related papers (2025-02-12T17:49:46Z)
SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
Scalable Graph Compressed Convolutions [68.85227170390864]
We propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution. Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning.
arXiv Detail & Related papers (2024-07-26T03:14:13Z)
What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks. This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z)
Attending to Topological Spaces: The Cellular Transformer [37.84207797241944]
Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. We introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes. CT achieves state-of-the-art performance, but it does so without the need for more complex enhancements.
arXiv Detail & Related papers (2024-05-23T01:48:32Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Deformable Graph Transformer [31.254872949603982]
We propose Deformable Graph Transformer (DGT) that performs sparse attention with dynamically sampled key and value pairs. Experiments demonstrate that our novel graph Transformer consistently outperforms existing Transformer-based models.
arXiv Detail & Related papers (2022-06-29T00:23:25Z)
Rethinking Graph Transformers with Spectral Attention [13.068288784805901]
We present the $textitSpectral Attention Network$ (SAN), which uses a learned positional encoding (LPE) to learn the position of each node in a given graph. By leveraging the full spectrum of the Laplacian, our model is theoretically powerful in distinguishing graphs, and can better detect similar sub-structures from their resonance. Our model performs on par or better than state-of-the-art GNNs, and outperforms any attention-based model by a wide margin.
arXiv Detail & Related papers (2021-06-07T18:11:11Z)
Self-Supervised Graph Representation Learning via Topology Transformations [61.870882736758624]
We present the Topology Transformation Equivariant Representation learning, a general paradigm of self-supervised learning for node representations of graph data. In experiments, we apply the proposed model to the downstream node and graph classification tasks, and results show that the proposed method outperforms the state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-25T06:11:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.