On Union-Closedness of Language Generation
- URL: http://arxiv.org/abs/2506.18642v1
- Date: Mon, 23 Jun 2025 13:42:25 GMT
- Title: On Union-Closedness of Language Generation
- Authors: Steve Hanneke, Amin Karbasi, Anay Mehrotra, Grigoris Velegkas,
- Abstract summary: We investigate language generation in the limit - a model by Kleinberg and Mullainathan and extended by Li, Raman, and Tewari.<n>Our results resolve two open questions of Li et al. by proving finite unions of generatable or non-uniformly generatable classes need not be generatable.<n>Our approach utilizes carefully constructed classes along with a novel diagonalization argument that could be of independent interest in the growing area of language generation.
- Score: 48.36356615217017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate language generation in the limit - a model by Kleinberg and Mullainathan [NeurIPS 2024] and extended by Li, Raman, and Tewari [COLT 2025]. While Kleinberg and Mullainathan proved generation is possible for all countable collections, Li et al. defined a hierarchy of generation notions (uniform, non-uniform, and generatable) and explored their feasibility for uncountable collections. Our first set of results resolve two open questions of Li et al. by proving finite unions of generatable or non-uniformly generatable classes need not be generatable. These follow from a stronger result: there is a non-uniformly generatable class and a uniformly generatable class whose union is non-generatable. This adds to the aspects along which language generation in the limit is different from traditional tasks in statistical learning theory like classification, which are closed under finite unions. In particular, it implies that given two generators for different collections, one cannot combine them to obtain a single "more powerful" generator, prohibiting this notion of boosting. Our construction also addresses a third open question of Li et al. on whether there are uncountable classes that are non-uniformly generatable and do not satisfy the eventually unbounded closure (EUC) condition introduced by Li, Raman, and Tewari. Our approach utilizes carefully constructed classes along with a novel diagonalization argument that could be of independent interest in the growing area of language generation.
Related papers
- Language Generation in the Limit: Noise, Loss, and Feedback [10.280148603465697]
We show that a finite union of uniformly generatable collections is generatable in the limit, and asked if the same is true for non-uniform generation.<n>We show the equivalence of these models for uniform and non-uniform generation, and provide a characterization of non-uniform noisy generation.
arXiv Detail & Related papers (2025-07-21T07:18:04Z) - Representative Language Generation [4.601683217376771]
"representative generation" is extended to address diversity and bias concerns in generative models.<n>We demonstrate feasibility for countably infinite hypothesis classes and collections of groups under certain conditions.<n>Our findings provide a rigorous foundation for developing more diverse and representative generative models.
arXiv Detail & Related papers (2025-05-27T23:02:54Z) - Characterizations of Language Generation With Breadth [16.30681257128492]
We study language generation in the limit, introduced by Kleinberg and Mullainathan [KM24].<n>KM24 proposed an algorithm that generates strings from any countable language collection in the limit.<n>We show that generation with exact breadth is characterized by Angluin's condition for identification.
arXiv Detail & Related papers (2024-12-24T16:24:43Z) - Exploring Facets of Language Generation in the Limit [10.18252143035175]
We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit.<n>We formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation.<n>We also provide a precise characterization of the language collections for which exhaustive generation is possible.
arXiv Detail & Related papers (2024-11-22T22:13:40Z) - Generation through the lens of learning theory [18.355039522639565]
We study generation through the lens of statistical learning theory.<n>We call "uniform" and "non-uniform" generation, and provide a characterization of which hypothesis classes are uniformly and non-uniformly generatable.
arXiv Detail & Related papers (2024-10-17T16:14:49Z) - Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding [75.06872859716049]
Large Language Models (LLMs) have demonstrated a powerful ability for text generation.
undesired behaviors such as toxicity or hallucinations can manifest.
We propose formalizing text generation as a future-constrained generation problem.
arXiv Detail & Related papers (2023-12-11T06:35:33Z) - Unextendibility, uncompletability, and many-copy indistinguishable ensembles [49.1574468325115]
We show that the complement of any bipartite pure entangled state is spanned by product states which form a nonorthogonal unextendible product basis (nUPB) of maximum cardinality.<n>We also report a class of multipartite many-copy indistinguishable ensembles for which local indistinguishability property increases with decreasing number of mixed states.
arXiv Detail & Related papers (2023-03-30T16:16:41Z) - Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering [76.99585451345702]
Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
arXiv Detail & Related papers (2021-10-14T04:36:29Z) - Language learnability in the limit for general metrics: a Gold-Angluin
result [91.3755431537592]
We use Niyogi's extended version of a theorem by Blum and Blum (1975) on the existence of locking data sets to prove a necessary condition for learnability in the limit of any family of languages in any given metric.
When the language family is further assumed to contain all finite languages, the same condition also becomes sufficient for learnability in the limit.
arXiv Detail & Related papers (2021-03-24T13:11:09Z) - Conditional Hybrid GAN for Sequence Generation [56.67961004064029]
We propose a novel conditional hybrid GAN (C-Hybrid-GAN) to solve this issue.
We exploit the Gumbel-Softmax technique to approximate the distribution of discrete-valued sequences.
We demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in context-conditioned discrete-valued sequence generation.
arXiv Detail & Related papers (2020-09-18T03:52:55Z) - GeDi: Generative Discriminator Guided Sequence Generation [53.15651536569169]
We propose GeDi as an efficient method for using smaller LMs as generative discriminators to guide generation from large LMs.
We find that GeDi gives stronger controllability than the state of the art method while also achieving generation speeds more than 30 times faster.
arXiv Detail & Related papers (2020-09-14T17:45:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.