Learning Languages with Decidable Hypotheses
- URL: http://arxiv.org/abs/2011.09866v1
- Date: Thu, 15 Oct 2020 09:27:47 GMT
- Title: Learning Languages with Decidable Hypotheses
- Authors: Julian Berger, Maximilian B\"other, Vanja Dosko\v{c}, Jonathan Gadea
Harder, Nicolas Klodt, Timo K\"otzing, Winfried L\"otzsch, Jannik Peters,
Leon Schiller, Lars Seifert, Armin Wells, Simon Wietheger
- Abstract summary: In language learning in the limit, the most common type of hypothesis is to give an enumerator for a language.
This so-called $W$-index allows for naming arbitrary computably enumerable languages.
We use a different system which allows for naming arbitrary decidable languages, namely programs for characteristic functions (called $C$-indices)
- Score: 1.2728819383164875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In language learning in the limit, the most common type of hypothesis is to
give an enumerator for a language. This so-called $W$-index allows for naming
arbitrary computably enumerable languages, with the drawback that even the
membership problem is undecidable. In this paper we use a different system
which allows for naming arbitrary decidable languages, namely programs for
characteristic functions (called $C$-indices). These indices have the drawback
that it is now not decidable whether a given hypothesis is even a legal
$C$-index.
In this first analysis of learning with $C$-indices, we give a structured
account of the learning power of various restrictions employing $C$-indices,
also when compared with $W$-indices. We establish a hierarchy of learning power
depending on whether $C$-indices are required (a) on all outputs; (b) only on
outputs relevant for the class to be learned and (c) only in the limit as
final, correct hypotheses. Furthermore, all these settings are weaker than
learning with $W$-indices (even when restricted to classes of computable
languages). We analyze all these questions also in relation to the mode of data
presentation.
Finally, we also ask about the relation of semantic versus syntactic
convergence and derive the map of pairwise relations for these two kinds of
convergence coupled with various forms of data presentation.
Related papers
- $O_2$ is a multiple context-free grammar: an implementation-, formalisation-friendly proof [0.0]
Classifying languages according to the expressiveness of grammars able to generate them is a fundamental problem in computational linguistics.
This paper analyses the existing proofs from the computational and the proof-theoretical point of views systematically studying whether each proof can lead to a verified (i.e., checked by a proof assistant) algorithm parsing balanced languages via MCFGs.
arXiv Detail & Related papers (2024-05-15T14:51:11Z) - How do Large Language Models Handle Multilingualism? [81.15060972112563]
This study explores how large language models (LLMs) handle multilingualism.
LLMs initially understand the query, converting multilingual inputs into English for task-solving.
In the intermediate layers, they employ English for thinking and incorporate multilingual knowledge with self-attention and feed-forward structures.
arXiv Detail & Related papers (2024-02-29T02:55:26Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Agnostic Multi-Group Active Learning [24.97598179536084]
We consider a variant of this problem from the perspective of active learning, where the learner is endowed with the power to decide which examples are labeled from each distribution in the collection.
Our main challenge is that standard active learning techniques such as disagreement-based active learning do not directly apply to the multi-group learning objective.
We modify existing algorithms to provide a consistent active learning algorithm for an agnostic formulation of multi-group learning.
arXiv Detail & Related papers (2023-06-02T21:24:13Z) - Generalized Funnelling: Ensemble Learning and Heterogeneous Document
Embeddings for Cross-Lingual Text Classification [78.83284164605473]
emphFunnelling (Fun) is a recently proposed method for cross-lingual text classification.
We describe emphGeneralized Funnelling (gFun) as a generalization of Fun.
We show that gFun substantially improves over Fun and over state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-17T23:33:04Z) - Agnostic learning with unknown utilities [70.14742836006042]
In many real-world problems, the utility of a decision depends on the underlying context $x$ and decision $y$.
We study this as agnostic learning with unknown utilities.
We show that estimating the utilities of only the sampled points$S$ suffices to learn a decision function which generalizes well.
arXiv Detail & Related papers (2021-04-17T08:22:04Z) - Safety Synthesis Sans Specification [5.874782446136914]
We argue that this is a natural question in many situations in hardware and software verification.
We devise a learning algorithm for this problem and show that its time and query complexity is with respect to the rank of the target language, its incompatibility measure, and the maximal length of a given counterexample.
arXiv Detail & Related papers (2020-11-15T21:13:17Z) - RNNs can generate bounded hierarchical languages with optimal memory [113.73133308478612]
We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax.
We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth.
We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
arXiv Detail & Related papers (2020-10-15T04:42:29Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.