From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages
- URL: http://arxiv.org/abs/2509.22598v1
- Date: Fri, 26 Sep 2025 17:17:15 GMT
- Title: From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages
- Authors: Katsuhiko Hayashi, Hidetaka Kamigaito,
- Abstract summary: We prove that all standard subregular language classes are linearly separable when represented by their deciding predicates.<n>Synthetic experiments confirm perfect separability under noise-free conditions, while real-data experiments on English morphology show that learned features align with well-known linguistic constraints.
- Score: 34.42559541958844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove that all standard subregular language classes are linearly separable when represented by their deciding predicates. This establishes finite observability and guarantees learnability with simple linear models. Synthetic experiments confirm perfect separability under noise-free conditions, while real-data experiments on English morphology show that learned features align with well-known linguistic constraints. These results demonstrate that the subregular hierarchy provides a rigorous and interpretable foundation for modeling natural language structure. Our code used in real-data experiments is available at https://github.com/UTokyo-HayashiLab/subregular.
Related papers
- Unnatural Languages Are Not Bugs but Features for LLMs [92.8332103170009]
Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts.<n>We present a systematic investigation challenging this perception, demonstrating that unnatural languages contain latent features usable by models.
arXiv Detail & Related papers (2025-03-02T12:10:17Z) - Can Language Models Learn Typologically Implausible Languages? [62.823015163987996]
Grammatical features across human languages show intriguing correlations often attributed to learning biases in humans.<n>We discuss how language models (LMs) allow us to better determine the role of domain-general learning biases in language universals.<n>We test LMs on an array of highly naturalistic but counterfactual versions of the English (head-initial) and Japanese (head-final) languages.
arXiv Detail & Related papers (2025-02-17T20:40:01Z) - Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models [0.0]
We show that the logarithmic perplexity of any large text generated by a language model must converge to the average entropy of its token distributions.<n>This defines a typical set'' that all long synthetic texts generated by a language model must belong to.
arXiv Detail & Related papers (2024-05-22T16:23:40Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - A Neural Model for Regular Grammar Induction [8.873449722727026]
We treat grammars as a model of computation and propose a novel neural approach to induction of regular grammars from positive and negative examples.
Our model is fully explainable, its intermediate results are directly interpretable as partial parses, and it can be used to learn arbitrary regular grammars when provided with sufficient data.
arXiv Detail & Related papers (2022-09-23T14:53:23Z) - Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on
a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates.
Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z) - Uncovering More Shallow Heuristics: Probing the Natural Language
Inference Capacities of Transformer-Based Pre-Trained Language Models Using
Syllogistic Patterns [9.031827448667086]
We explore the shallows used by transformer-based pre-trained language models (PLMs) that are fine-tuned for natural language inference (NLI)
We find evidence that the models rely heavily on certain shallows, picking up on symmetries and asymmetries between premise and hypothesis.
arXiv Detail & Related papers (2022-01-19T14:15:41Z) - Learning Symbolic Rules for Reasoning in Quasi-Natural Language [74.96601852906328]
We build a rule-based system that can reason with natural language input but without the manual construction of rules.
We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences.
Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks.
arXiv Detail & Related papers (2021-11-23T17:49:00Z) - Exploring Transitivity in Neural NLI Models through Veridicality [39.845425535943534]
We focus on the transitivity of inference relations, a fundamental property for systematically drawing inferences.
A model capturing transitivity can compose basic inference patterns and draw new inferences.
We find that current NLI models do not perform consistently well on transitivity inference tasks.
arXiv Detail & Related papers (2021-01-26T11:18:35Z) - Learning Music Helps You Read: Using Transfer to Study Linguistic
Structure in Language Models [27.91397366776451]
Training LSTMs on latent structure (MIDI music or Java code) improves test performance on natural language.
Experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological similarity to the training language.
arXiv Detail & Related papers (2020-04-30T06:24:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.