Related papers: On the Intersection of Context-Free and Regular Languages

On the Intersection of Context-Free and Regular Languages

URL: http://arxiv.org/abs/2209.06809v2
Date: Thu, 18 May 2023 09:57:42 GMT
Title: On the Intersection of Context-Free and Regular Languages
Authors: Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, Ryan Cotterell
Abstract summary: We generalize the Bar-Hillel construction to handle finite-state automatons with $varepsilon$-arcs. We prove that our construction leads to a grammar that encodes the structure of both the input automaton and grammar while retaining the size of the original construction.
Score: 71.61206349427509
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Bar-Hillel construction is a classic result in formal language theory. It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free. In the construction, the regular language is specified by a finite-state automaton. However, neither the original construction (Bar-Hillel et al., 1961) nor its weighted extension (Nederhof and Satta, 2003) can handle finite-state automata with $\varepsilon$-arcs. While it is possible to remove $\varepsilon$-arcs from a finite-state automaton efficiently without modifying the language, such an operation modifies the automaton's set of paths. We give a construction that generalizes the Bar-Hillel in the case where the desired automaton has $\varepsilon$-arcs, and further prove that our generalized construction leads to a grammar that encodes the structure of both the input automaton and grammar while retaining the asymptotic size of the original construction.

Related papers

Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models [13.843101157271034]
We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for compilers.<n>We built this approach into a tool, Germinator.<n>When evaluated on six MLIR projects spanning 91 dialects, Germinator generated seeds improve line coverage by 10-120% over grammar-based baselines.
arXiv Detail & Related papers (2025-12-05T17:00:47Z)
Formal Languages and TQFTs with Defects [0.0]
A construction that assigns a Boolean 1D TQFT with defects to a finite state automaton was recently developed by Gustafson, Im, Kaldawy, Khovanov, and Lihn. We show that the construction is functorial with respect to the category of finite state automata with transducers as morphisms.
arXiv Detail & Related papers (2024-12-12T19:07:31Z)
Efficient Semiring-Weighted Earley Parsing [71.77514972816347]
This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O (N3|GR|)$, which is unworkable for the large grammars that arise in natural language processing. We treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes.
arXiv Detail & Related papers (2023-07-06T13:33:59Z)
Memory Augmented Large Language Models are Computationally Universal [44.64529266193095]
We show that transformer-based large language models are computationally universal when augmented with an external memory. We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine.
arXiv Detail & Related papers (2023-01-10T02:37:44Z)
On (co-lex) Ordering Automata [2.800608984818919]
We show that a canonical, minimum-width, partially-ordered automaton accepting a language L can be exhibited. Using H we prove that the width of the language can be effectively computed from the minimum automaton recognizing the language.
arXiv Detail & Related papers (2021-06-04T07:41:58Z)
Hierarchical Poset Decoding for Compositional Generalization in Language [52.13611501363484]
We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset) Current encoder-decoder architectures do not take the poset structure of semantics into account properly. We propose a novel hierarchical poset decoding paradigm for compositional generalization in language.
arXiv Detail & Related papers (2020-10-15T14:34:26Z)
RNNs can generate bounded hierarchical languages with optimal memory [113.73133308478612]
We show that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth. We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction.
arXiv Detail & Related papers (2020-10-15T04:42:29Z)
Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text. We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages. We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z)
Ambiguity Hierarchy of Regular Infinite Tree Languages [77.34726150561087]
A language is k-ambiguous (respectively, boundedly, finitely, ambiguous) if it is accepted by a k-ambiguous automaton. An automaton is boundedly ambiguous if it is k-ambiguous for some $k in mathbbN$.
arXiv Detail & Related papers (2020-09-07T10:08:24Z)
Named Entity Extraction with Finite State Transducers [0.0]
We describe a named entity tagging system that requires minimal linguistic knowledge. The system is based on the ideas of the Brill's tagger which makes it really simple.
arXiv Detail & Related papers (2020-06-20T11:09:04Z)
On the Power of Unambiguity in B\"uchi Complementation [0.0]
We exploit the power of emphunambiguity for the complementation problem of B"uchi automata. We show how to use this type of reduced run DAGs as a emphunified tool to optimize emphboth rank-based and slice-based complementation constructions.
arXiv Detail & Related papers (2020-05-18T22:51:34Z)
Incremental Monoidal Grammars [2.685668802278155]
We define formal grammars in terms of free monoidal categories, along with a functor from the category of formal grammars to the category of automata. This allows us to link the categorical viewpoint on natural language to the standard machine learning notion of probabilistic language model.
arXiv Detail & Related papers (2020-01-02T22:37:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.