Language Generation: Complexity Barriers and Implications for Learning
- URL: http://arxiv.org/abs/2511.05759v1
- Date: Fri, 07 Nov 2025 23:06:48 GMT
- Title: Language Generation: Complexity Barriers and Implications for Learning
- Authors: Marcelo Arenas, Pablo Barceló, Luis Cofré, Alexander Kozachinskiy,
- Abstract summary: We show that even for simple and well-studied language families the number of examples required for successful generation can be extraordinarily large.<n>These results reveal a substantial gap between theoretical possibility and efficient learnability.
- Score: 51.449718747429756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kleinberg and Mullainathan showed that, in principle, language generation is always possible: with sufficiently many positive examples, a learner can eventually produce sentences indistinguishable from those of a target language. However, the existence of such a guarantee does not speak to its practical feasibility. In this work, we show that even for simple and well-studied language families -- such as regular and context-free languages -- the number of examples required for successful generation can be extraordinarily large, and in some cases not bounded by any computable function. These results reveal a substantial gap between theoretical possibility and efficient learnability. They suggest that explaining the empirical success of modern language models requires a refined perspective -- one that takes into account structural properties of natural language that make effective generation possible in practice.
Related papers
- Safe Language Generation in the Limit [31.198980760468434]
We formalize the tasks of safe language identification and generation.<n>We prove that safe language identification is impossible, and that safe language generation is at least as hard as (vanilla) language identification, which is also impossible.
arXiv Detail & Related papers (2026-01-13T15:25:44Z) - Image, Word and Thought: A More Challenging Language Task for the Iterated Learning Model [1.7205106391379026]
Iterated learning model simulates transmission of language from generation to generation.<n>Agents in this model are able to learn and transmit a language that is expressive.
arXiv Detail & Related papers (2026-01-06T10:53:00Z) - Large language models are not about language [0.0]
Human language is underpinned by a mind-internal computational system that generates hierarchical thought structures.<n>The language system grows with minimal external input and can readily distinguish between real language and impossible languages.
arXiv Detail & Related papers (2025-12-15T15:36:42Z) - Can Language Models Learn Typologically Implausible Languages? [62.823015163987996]
Grammatical features across human languages show intriguing correlations often attributed to learning biases in humans.<n>We discuss how language models (LMs) allow us to better determine the role of domain-general learning biases in language universals.<n>We test LMs on an array of highly naturalistic but counterfactual versions of the English (head-initial) and Japanese (head-final) languages.
arXiv Detail & Related papers (2025-02-17T20:40:01Z) - Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning [9.75748930802634]
We study the power of in-context-learning to improve machine learning performance.<n>We consider an extremely simple domain: next token prediction on simple language tasks.<n>We find that LLMs uniformly underperform n-gram models on this task.
arXiv Detail & Related papers (2025-01-06T07:57:51Z) - Exploring Facets of Language Generation in the Limit [10.18252143035175]
We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit.<n>We formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation.<n>We also provide a precise characterization of the language collections for which exhaustive generation is possible.
arXiv Detail & Related papers (2024-11-22T22:13:40Z) - Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency [0.11510009152620666]
We argue that claims regarding linguistic capabilities of Large Language Models (LLMs) are based on at least two unfounded assumptions.
Language completeness assumes that a distinct and complete thing such as a natural language' exists.
The assumption of data completeness relies on the belief that a language can be quantified and wholly captured by data.
arXiv Detail & Related papers (2024-07-11T18:06:01Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.