Related papers: From Sequence to Structure: Uncovering Substructure Reasoning in Transformers

From Sequence to Structure: Uncovering Substructure Reasoning in Transformers

URL: http://arxiv.org/abs/2507.10435v1
Date: Fri, 11 Jul 2025 17:36:24 GMT
Title: From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Authors: Xinnan Dai, Kai Yang, Jay Revolinsky, Kai Guo, Aoran Wang, Bohang Zhang, Jiliang Tang,
Abstract summary: We show how a decoder-only Transformer architecture can understand underlying graph structures.<n>We introduce the concept of thinking in substructures to efficiently extract complex composite patterns.<n>Our findings offer a new insight on how sequence-based Transformers perform the substructure extraction task over graph data.
Score: 35.80526987002848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies suggest that large language models (LLMs) possess the capability to solve graph reasoning tasks. Notably, even when graph structures are embedded within textual descriptions, LLMs can still effectively answer related questions. This raises a fundamental question: How can a decoder-only Transformer architecture understand underlying graph structures? To address this, we start with the substructure extraction task, interpreting the inner mechanisms inside the transformers and analyzing the impact of the input queries. Specifically, through both empirical results and theoretical analysis, we present Induced Substructure Filtration (ISF), a perspective that captures the substructure identification in the multi-layer transformers. We further validate the ISF process in LLMs, revealing consistent internal dynamics across layers. Building on these insights, we explore the broader capabilities of Transformers in handling diverse graph types. Specifically, we introduce the concept of thinking in substructures to efficiently extract complex composite patterns, and demonstrate that decoder-only Transformers can successfully extract substructures from attributed graphs, such as molecular graphs. Together, our findings offer a new insight on how sequence-based Transformers perform the substructure extraction task over graph data.

Related papers

A Survey of Graph Transformers: Architectures, Theories and Applications [54.561539625830186]
Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers.<n>We categorize the architecture of Graph Transformers according to their strategies for processing structural information.<n>We provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision, traffic, brain and material data.
arXiv Detail & Related papers (2025-02-23T10:55:19Z)
Transformers Use Causal World Models in Maze-Solving Tasks [49.67445252528868]
We identify World Models in transformers trained on maze-solving tasks.<n>We find that it is easier to activate features than to suppress them.<n> positional encoding schemes appear to influence how World Models are structured within the model's residual stream.
arXiv Detail & Related papers (2024-12-16T15:21:04Z)
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis [8.008567379796666]
We provide a fundamental understanding of what distinguishes the Transformer from the other architectures.<n>Our results suggest that various common architectural and optimization choices in Transformers can be traced back to their highly non-linear dependencies.
arXiv Detail & Related papers (2024-10-14T18:15:02Z)
How Transformers Learn Causal Structure with Gradient Descent [44.31729147722701]
Self-attention allows transformers to encode causal structure. We introduce an in-context learning task that requires learning latent causal structure. We show that transformers trained on our in-context learning task are able to recover a wide variety of causal structures.
arXiv Detail & Related papers (2024-02-22T17:47:03Z)
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings [60.698130703909804]
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset. We propose SQ-Transformer that explicitly encourages systematicity in the embeddings and attention layers. We show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets.
arXiv Detail & Related papers (2024-02-09T15:53:15Z)
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models [9.56229382432426]
This research aims to reverse engineer transformer models into human-readable representations that implement algorithmic functions. By applying circuit interpretability analysis, we identify a key sub-circuit in both GPT-2 Small and Llama-2-7B. We show that this sub-circuit has effects on various math-related prompts, such as on intervaled circuits, Spanish number word and months continuation, and natural language word problems.
arXiv Detail & Related papers (2023-11-07T16:58:51Z)
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations. We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z)
Relational Attention: Generalizing Transformers for Graph-Structured Tasks [0.8702432681310401]
Transformers operate over sets of real-valued vectors representing task-specific entities and their attributes. But as set processors, transformers are at a disadvantage in reasoning over more general graph-structured data. We generalize transformer attention to consider and update edge vectors in each transformer layer.
arXiv Detail & Related papers (2022-10-11T00:25:04Z)
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning. The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily. Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z)
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z)
Structure-Aware Transformer for Graph Representation Learning [7.4124458942877105]
We show that node representations generated by the Transformer with positional encoding do not necessarily capture structural similarity between them. We propose the Structure-Aware Transformer, a class of simple and flexible graph transformers built upon a new self-attention mechanism. Our framework can leverage any existing GNN to extract the subgraph representation, and we show that it systematically improves performance relative to the base GNN model.
arXiv Detail & Related papers (2022-02-07T09:53:39Z)
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models [29.828669678974983]
We extend the scope of the analysis of Transformers from solely the attention patterns to the whole attention block. Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed.
arXiv Detail & Related papers (2021-09-15T08:32:20Z)
Do Syntax Trees Help Pre-trained Transformers Extract Information? [8.133145094593502]
We study the utility of incorporating dependency trees into pre-trained transformers on information extraction tasks. We propose and investigate two distinct strategies for incorporating dependency structure. We find that their performance gains are highly contingent on the availability of human-annotated dependency parses.
arXiv Detail & Related papers (2020-08-20T17:17:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.