On the Emergence and Test-Time Use of Structural Information in Large Language Models
- URL: http://arxiv.org/abs/2601.17869v1
- Date: Sun, 25 Jan 2026 15:02:25 GMT
- Title: On the Emergence and Test-Time Use of Structural Information in Large Language Models
- Authors: Michelle Chao Chen, Moritz Miller, Bernhard Schölkopf, Siyuan Guo,
- Abstract summary: We study how language models learn abstract structures and utilize the learnt structural information at test-time.<n>We empirically show that the emergence of learning structural information correlates with complex reasoning tasks.
- Score: 52.28603345019514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning structural information from observational data is central to producing new knowledge outside the training corpus. This holds for mechanistic understanding in scientific discovery as well as flexible test-time compositional generation. We thus study how language models learn abstract structures and utilize the learnt structural information at test-time. To ensure a controlled setup, we design a natural language dataset based on linguistic structural transformations. We empirically show that the emergence of learning structural information correlates with complex reasoning tasks, and that the ability to perform test-time compositional generation remains limited.
Related papers
- Deep networks learn to parse uniform-depth context-free languages from local statistics [12.183764229746926]
Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning.<n>We introduce a class of context-free grammars (PCFGs) in which both the degree of ambiguity and the correlation structure across scales can be controlled.<n>We propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.
arXiv Detail & Related papers (2026-01-31T17:35:06Z) - Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach [33.17711262799183]
We develop a task-agnostic structured knowledge hunter for knowledge-enhanced text generation tasks.<n>Our model achieves high interpretability, enabling users to comprehend the model output generation process.<n>We empirically demonstrate the effectiveness of our model in both internal knowledge-enhanced table-to-text generation on the RotoWireFG dataset and external knowledge-enhanced dialogue response generation on the KdConv dataset.
arXiv Detail & Related papers (2025-11-28T16:43:46Z) - Finding Structure in Language Models [3.882018118763685]
This thesis is about whether language models possess a deep understanding of grammatical structure similar to that of humans.
We will develop novel interpretability techniques that enhance our understanding of the complex nature of large-scale language models.
arXiv Detail & Related papers (2024-11-25T14:37:24Z) - Generative Hierarchical Materials Search [91.93125016916463]
We propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures.
GenMS consists of (1) a language model that takes high-level natural language as input and generates intermediate textual information about a crystal.
GenMS additionally uses a graph neural network to predict properties (e.g., formation energy) from the generated crystal structures.
arXiv Detail & Related papers (2024-09-10T17:51:28Z) - Language Evolution with Deep Learning [49.879239655532324]
Computational modeling plays an essential role in the study of language emergence.
It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language.
This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models.
arXiv Detail & Related papers (2024-03-18T16:52:54Z) - On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games [55.2480439325792]
In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other.
By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables.
We formalize a novel reinforcement learning model which explicitly represents the information structure.
arXiv Detail & Related papers (2024-03-01T21:28:19Z) - Punctuation Restoration Improves Structure Understanding Without Supervision [5.925894224649895]
We show that punctuation restoration as a learning objective improves performance on structure-related tasks.<n>Our results show that punctuation restoration is an effective learning objective that can improve structure understanding.
arXiv Detail & Related papers (2024-02-13T11:22:52Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge.<n>We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.<n>We demonstrate that generative models like GPT can accurately learn and reason over CFG-defined hierarchies and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Chain-of-Knowledge: Grounding Large Language Models via Dynamic
Knowledge Adapting over Heterogeneous Sources [87.26486246513063]
Chain-of-knowledge (CoK) is a framework that augments large language models.
CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation.
arXiv Detail & Related papers (2023-05-22T17:34:23Z) - Unifying Structure Reasoning and Language Model Pre-training for Complex
Reasoning [26.811507121199323]
This paper proposes a unified learning framework that combines explicit structure reasoning and language pre-training to endow PLMs with the structure reasoning skill.
It first identifies several elementary structures within contexts to construct structured queries and performs step-by-step reasoning along the queries to identify the answer entity.
Experimental results on four datasets demonstrate that the proposed model achieves significant improvements in complex reasoning tasks involving diverse structures.
arXiv Detail & Related papers (2023-01-21T08:18:11Z) - Discrete Latent Structure in Neural Networks [32.41642110537956]
This text explores three broad strategies for learning with discrete latent structure.
We show how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
arXiv Detail & Related papers (2023-01-18T12:30:44Z) - DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text.
Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks.
We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z) - Modelling Compositionality and Structure Dependence in Natural Language [0.12183405753834563]
Drawing on linguistics and set theory, a formalisation of these ideas is presented in the first half of this thesis.
We see how cognitive systems that process language need to have certain functional constraints.
Using the advances of word embedding techniques, a model of relational learning is simulated.
arXiv Detail & Related papers (2020-11-22T17:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.