Related papers: Graph Transformers Dream of Electric Flow

Graph Transformers Dream of Electric Flow

URL: http://arxiv.org/abs/2410.16699v1
Date: Tue, 22 Oct 2024 05:11:45 GMT
Title: Graph Transformers Dream of Electric Flow
Authors: Xiang Cheng, Lawrence Carin, Suvrit Sra,
Abstract summary: We show that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems. We present explicit weight configurations for implementing each such graph algorithm, and we bound the errors of the constructed Transformers by the errors of the underlying algorithms.
Score: 72.06286909236827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show theoretically and empirically that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems such as electric flow and eigenvector decomposition. The input to the Transformer is simply the graph incidence matrix; no other explicit positional encoding information is provided. We present explicit weight configurations for implementing each such graph algorithm, and we bound the errors of the constructed Transformers by the errors of the underlying algorithms. Our theoretical findings are corroborated by experiments on synthetic data. Additionally, on a real-world molecular regression task, we observe that the linear Transformer is capable of learning a more effective positional encoding than the default one based on Laplacian eigenvectors. Our work is an initial step towards elucidating the inner-workings of the Transformer for graph data.

Related papers

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape. This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
How Well Can Transformers Emulate In-context Newton's Method? [46.08521978754298]
We study whether Transformers can perform higher order optimization methods, beyond the case of linear regression. We demonstrate the ability of even linear attention-only Transformers in implementing a single step of Newton's iteration for matrix inversion with merely two layers.
arXiv Detail & Related papers (2024-03-05T18:20:10Z)
Linear Transformers are Versatile In-Context Learners [19.988368693379087]
We prove that each layer of a linear transformer maintains a weight vector for an implicit linear regression problem. We also investigate the use of linear transformers in a challenging scenario where the training data is corrupted with different levels of noise. Remarkably, we demonstrate that for this problem linear transformers discover an intricate and highly effective optimization algorithm.
arXiv Detail & Related papers (2024-02-21T23:45:57Z)
How do Transformers perform In-Context Autoregressive Learning? [76.18489638049545]
We train a Transformer model on a simple next token prediction task. We show how a trained Transformer predicts the next token by first learning $W$ in-context, then applying a prediction mapping.
arXiv Detail & Related papers (2024-02-08T16:24:44Z)
Transformers learn to implement preconditioned gradient descent for in-context learning [41.74394657009037]
Several recent works demonstrate that transformers can implement algorithms like gradient descent. We ask: Can transformers learn to implement such algorithms by training over random problem instances? For a transformer with $L$ attention layers, we prove certain critical points of the training objective implement $L$ iterations of preconditioned gradient descent.
arXiv Detail & Related papers (2023-06-01T02:35:57Z)
Graph Inductive Biases in Transformers without Message Passing [47.238185813842996]
New Graph Inductive bias Transformer (GRIT) incorporates graph inductive biases without using message passing. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets.
arXiv Detail & Related papers (2023-05-27T22:26:27Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers. Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z)
What Dense Graph Do You Need for Self-Attention? [73.82686008622596]
We present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.
arXiv Detail & Related papers (2022-05-27T14:36:55Z)
A Generalization of Transformer Networks to Graphs [5.736353542430439]
We introduce a graph transformer with four new properties compared to the standard model. The architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs)
arXiv Detail & Related papers (2020-12-17T16:11:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.