Unraveling Syntax: How Language Models Learn Context-Free Grammars
- URL: http://arxiv.org/abs/2510.02524v1
- Date: Thu, 02 Oct 2025 19:52:19 GMT
- Title: Unraveling Syntax: How Language Models Learn Context-Free Grammars
- Authors: Laura Ying Schulz, Daniel Mitropolsky, Tomaso Poggio,
- Abstract summary: We study the learning dynamics of small models trained on synthetic languages generated from context-free grammars.<n>We find that unlike children, who first master simple substructures before progressing to more complex constructions, transformers reduce loss across all subgrammars in parallel.
- Score: 1.0465074236788003
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce a new framework for understanding how language models acquire syntax. While large models achieve impressive results, little is known about their learning dynamics. Our approach starts with the observation that most domains of interest, such as natural language syntax, coding languages, arithmetic problems, are captured by probabilistic context-free grammars (PCFGs). We study the learning dynamics of small models trained on synthetic languages generated from PCFGs, enabling precise control over grammar complexity, recursion depth, and subgrammar structure. We prove several general, recursive formulae for the training loss and Kullback-Leibler divergence over the subgrammar structure of a PCFG. Empirically, we find that unlike children, who first master simple substructures before progressing to more complex constructions, transformers reduce loss across all subgrammars in parallel. We further show that subgrammar pretraining can improve the final loss for smaller models, and that pretrained models develop internal representations more aligned with the grammar's substructure. Finally, we demonstrate that models struggle with deeper recursive structures (a limitation even of large language models), revealing fundamental challenges in how neural networks represent hierarchical syntax. Overall, our work initiates the study of the learning dynamics of transformers on PCFGs as a versatile testbed for probing learning in language models, opening a research direction with many open questions.
Related papers
- Sequences of Logits Reveal the Low Rank Structure of Language Models [22.148282143726835]
We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level.<n>We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure.<n>We then show that this low-rank structure can be leveraged for generation.
arXiv Detail & Related papers (2025-10-28T20:55:58Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings.<n>We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures.<n>Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Transformer-Based Models Are Not Yet Perfect At Learning to Emulate
Structural Recursion [14.739369424331478]
We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to sequence modeling problems and learned models' behavior.
With our framework as a powerful conceptual tool, we identify different issues under various set-ups.
arXiv Detail & Related papers (2024-01-23T18:07:38Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge.<n>We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.<n>We demonstrate that generative models like GPT can accurately learn and reason over CFG-defined hierarchies and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.