Related papers: Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

URL: http://arxiv.org/abs/2503.01805v1
Date: Mon, 03 Mar 2025 18:33:58 GMT
Title: Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers
Authors: Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher, Orr Fischer, Ran Gilad-Bachrach, Amir Globerson,
Abstract summary: We show that with linear width, constant depth suffices for solving a host of graph-based problems.<n>This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference time.<n>Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms.
Score: 33.63507016806947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement a task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We support our theoretical results with empirical evaluations.

Related papers

Transformers Struggle to Learn to Search [32.231381064112085]
We use the foundational graph connectivity problem as a testbed to generate effectively limitless high-coverage data to train small transformers.<n>We find that, when given the right training distribution, the transformer is able to learn to search.<n>We also find that performing search in-context (i.e., chain-of-thought) does not resolve this inability to learn to search on larger graphs.
arXiv Detail & Related papers (2024-12-06T01:29:24Z)
Understanding Transformer Reasoning Capabilities via Graph Algorithms [25.08208816144745]
We study which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems. Our results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.
arXiv Detail & Related papers (2024-05-28T18:31:14Z)
Simulation of Graph Algorithms with Looped Transformers [6.0465914748433915]
We study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path. We show a Turing Completeness result with constant width when the extra attention heads are utilized.
arXiv Detail & Related papers (2024-02-02T02:48:03Z)
Symmetry-preserving graph attention network to solve routing problems at multiple resolutions [1.9304772860080408]
We introduce the first-ever completely equivariant model and training to solve problems. It is essential to capture the multiscale structure of the input graph. We propose a Multiresolution scheme in combination with Equi Graph Attention network (mEGAT) architecture.
arXiv Detail & Related papers (2023-10-24T06:22:20Z)
NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes. The efficient computation is enabled by a kernerlized Gumbel-Softmax operator. Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z)
What Dense Graph Do You Need for Self-Attention? [73.82686008622596]
We present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.
arXiv Detail & Related papers (2022-05-27T14:36:55Z)
Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences [55.329402218608365]
We propose the Neighbor2Seq to transform the hierarchical neighborhood of each node into a sequence. We evaluate our method on a massive graph with more than 111 million nodes and 1.6 billion edges. Results show that our proposed method is scalable to massive graphs and achieves superior performance across massive and medium-scale graphs.
arXiv Detail & Related papers (2022-02-07T16:38:36Z)
Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain. This allows us to define an entirely structural model that does not require computing the embedding of the input graph. Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z)
Graphs for deep learning representations [1.0152838128195467]
We introduce a graph formalism based on the recent advances in Graph Signal Processing (GSP) Namely, we use graphs to represent the latent spaces of deep neural networks. We showcase that this graph formalism allows us to answer various questions including: ensuring robustness, reducing the amount of arbitrary choices in the design of the learning process, improving to small generalizations added to the inputs, and reducing computational complexity.
arXiv Detail & Related papers (2020-12-14T11:51:23Z)
Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery. We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability. We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.