Language learnability in the limit for general metrics: a Gold-Angluin
result
- URL: http://arxiv.org/abs/2103.13166v1
- Date: Wed, 24 Mar 2021 13:11:09 GMT
- Title: Language learnability in the limit for general metrics: a Gold-Angluin
result
- Authors: Fernando C. Alves
- Abstract summary: We use Niyogi's extended version of a theorem by Blum and Blum (1975) on the existence of locking data sets to prove a necessary condition for learnability in the limit of any family of languages in any given metric.
When the language family is further assumed to contain all finite languages, the same condition also becomes sufficient for learnability in the limit.
- Score: 91.3755431537592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In his pioneering work in the field of Inductive Inference, Gold (1967)
proved that a set containing all finite languages and at least one infinite
language over the same fixed alphabet is not learnable in the exact sense.
Within the same framework, Angluin (1980) provided a complete characterization
for the learnability of language families. Mathematically, the concept of exact
learning in that classical setting can be seen as the use of a particular type
of metric for learning in the limit. In this short research note we use
Niyogi's extended version of a theorem by Blum and Blum (1975) on the existence
of locking data sets to prove a necessary condition for learnability in the
limit of any family of languages in any given metric. This recovers Gold's
theorem as a special case. Moreover, when the language family is further
assumed to contain all finite languages, the same condition also becomes
sufficient for learnability in the limit.
Related papers
- Regular language quantum states [0.5499796332553706]
We introduce regular language states, a family of quantum many-body states.
They are built from a special class of formal languages, called regular.
We exploit the theory of tensor networks to find an efficient criterion to determine when regular languages are shift-invariant.
arXiv Detail & Related papers (2024-07-24T21:09:22Z) - Language Generation in the Limit [0.7787343335258782]
We show that there is an agent that is able to generate in the limit for every countable list of candidate languages.
This contrasts dramatically with negative results due to Gold and Angluin in a well-studied model of language learning.
arXiv Detail & Related papers (2024-04-10T05:53:25Z) - A Measure-Theoretic Characterization of Tight Language Models [105.16477132329416]
In some pathological cases, probability mass can leak'' onto the set of infinite sequences.
This paper offers a measure-theoretic treatment of language modeling.
We prove that many popular language model families are in fact tight, meaning that they will not leak in this sense.
arXiv Detail & Related papers (2022-12-20T18:17:11Z) - Learning Symbolic Rules for Reasoning in Quasi-Natural Language [74.96601852906328]
We build a rule-based system that can reason with natural language input but without the manual construction of rules.
We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences.
Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks.
arXiv Detail & Related papers (2021-11-23T17:49:00Z) - Provable Limitations of Acquiring Meaning from Ungrounded Form: What
will Future Language Models Understand? [87.20342701232869]
We investigate the abilities of ungrounded systems to acquire meaning.
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
We find that assertions enable semantic emulation if all expressions in the language are referentially transparent.
However, if the language uses non-transparent patterns like variable binding, we show that emulation can become an uncomputable problem.
arXiv Detail & Related papers (2021-04-22T01:00:17Z) - Deciphering Undersegmented Ancient Scripts Using Phonetic Prior [31.707254394215283]
Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges.
We propose a model that handles both of these challenges by building on rich linguistic constraints.
We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian)
arXiv Detail & Related papers (2020-10-21T15:03:52Z) - Maps for Learning Indexable Classes [1.2728819383164875]
We study learning of indexed families from positive data where a learner can freely choose a hypothesis space.
We are interested in various restrictions on learning, such as consistency, conservativeness or set-drivenness.
arXiv Detail & Related papers (2020-10-15T09:34:07Z) - RNNs can generate bounded hierarchical languages with optimal memory [113.73133308478612]
We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax.
We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth.
We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
arXiv Detail & Related papers (2020-10-15T04:42:29Z) - Learning Languages in the Limit from Positive Information with Finitely
Many Memory Changes [0.0]
We show that non-U-shapedness is not restrictive, while conservativeness and (strong) monotonicity are.
We also give an example of a non-semantic restriction (strongly non-U-shapedness) where the two settings differ.
arXiv Detail & Related papers (2020-10-09T19:59:49Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.