Safe Language Generation in the Limit
- URL: http://arxiv.org/abs/2601.08648v1
- Date: Tue, 13 Jan 2026 15:25:44 GMT
- Title: Safe Language Generation in the Limit
- Authors: Antonios Anastasopoulos, Giuseppe Ateniese, Evgenios M. Kornaropoulos,
- Abstract summary: We formalize the tasks of safe language identification and generation.<n>We prove that safe language identification is impossible, and that safe language generation is at least as hard as (vanilla) language identification, which is also impossible.
- Score: 31.198980760468434
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent results in learning a language in the limit have shown that, although language identification is impossible, language generation is tractable. As this foundational area expands, we need to consider the implications of language generation in real-world settings. This work offers the first theoretical treatment of safe language generation. Building on the computational paradigm of learning in the limit, we formalize the tasks of safe language identification and generation. We prove that under this model, safe language identification is impossible, and that safe language generation is at least as hard as (vanilla) language identification, which is also impossible. Last, we discuss several intractable and tractable cases.
Related papers
- Large language models are not about language [0.0]
Human language is underpinned by a mind-internal computational system that generates hierarchical thought structures.<n>The language system grows with minimal external input and can readily distinguish between real language and impossible languages.
arXiv Detail & Related papers (2025-12-15T15:36:42Z) - Language Generation: Complexity Barriers and Implications for Learning [51.449718747429756]
We show that even for simple and well-studied language families the number of examples required for successful generation can be extraordinarily large.<n>These results reveal a substantial gap between theoretical possibility and efficient learnability.
arXiv Detail & Related papers (2025-11-07T23:06:48Z) - MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z) - Density Measures for Language Generation [2.2872032473279065]
We study the trade-off between validity and breadth of language generation algorithms.<n>Existing algorithms for language generation in the limit produce output sets that can have zero density in the true language.<n>We show, however, that we provide an algorithm for language generation in the limit whose outputs have strictly positive density in $K$.
arXiv Detail & Related papers (2025-04-19T18:08:18Z) - Exploring Facets of Language Generation in the Limit [10.18252143035175]
We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit.<n>We formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation.<n>We also provide a precise characterization of the language collections for which exhaustive generation is possible.
arXiv Detail & Related papers (2024-11-22T22:13:40Z) - Language Generation in the Limit [0.7787343335258782]
We show that there is an agent that is able to generate in the limit for every countable list of candidate languages.
This contrasts dramatically with negative results due to Gold and Angluin in a well-studied model of language learning.
arXiv Detail & Related papers (2024-04-10T05:53:25Z) - Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding [75.06872859716049]
Large Language Models (LLMs) have demonstrated a powerful ability for text generation.
undesired behaviors such as toxicity or hallucinations can manifest.
We propose formalizing text generation as a future-constrained generation problem.
arXiv Detail & Related papers (2023-12-11T06:35:33Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English.
We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian.
While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.