Fast Deterministic Black-box Context-free Grammar Inference
- URL: http://arxiv.org/abs/2308.06163v2
- Date: Wed, 17 Jan 2024 01:09:34 GMT
- Title: Fast Deterministic Black-box Context-free Grammar Inference
- Authors: Mohammad Rifat Arefin, Suraj Shetiya, Zili Wang and Christoph Csallner
- Abstract summary: State-of-the-art approach generalizes grammar rules starting from flat parse trees.
We observe that many of Arvada's generalizations violate common language concept nesting rules.
The resulting TreeVada yielded faster and higher-quality grammars in an empirical comparison.
- Score: 7.637155559284357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-box context-free grammar inference is a hard problem as in many
practical settings it only has access to a limited number of example programs.
The state-of-the-art approach Arvada heuristically generalizes grammar rules
starting from flat parse trees and is non-deterministic to explore different
generalization sequences. We observe that many of Arvada's generalization steps
violate common language concept nesting rules. We thus propose to pre-structure
input programs along these nesting rules, apply learnt rules recursively, and
make black-box context-free grammar inference deterministic. The resulting
TreeVada yielded faster runtime and higher-quality grammars in an empirical
comparison. The TreeVada source code, scripts, evaluation parameters, and
training data are open-source and publicly available
(https://doi.org/10.6084/m9.figshare.23907738).
Related papers
- Incremental Context-free Grammar Inference in Black Box Settings [17.601446198181048]
Black-box context-free grammar inference is a significant challenge in many practical settings.
We propose a novel method that segments example strings into smaller units and incrementally infers the grammar.
Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison.
arXiv Detail & Related papers (2024-08-29T17:00:38Z) - Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks [6.390468088226495]
We propose a new method to extract and explore significant fine-grained grammar patterns from treebanks.
We extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order.
Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.
arXiv Detail & Related papers (2024-03-26T09:39:53Z) - Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG)
CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules.
It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z) - ChatRule: Mining Logical Rules with Large Language Models for Knowledge
Graph Reasoning [107.61997887260056]
We propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs.
Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs.
To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs.
arXiv Detail & Related papers (2023-09-04T11:38:02Z) - Efficient Semiring-Weighted Earley Parsing [71.77514972816347]
This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups.
Our presentation includes a known worst-case runtime improvement from Earley's $O (N3|GR|)$, which is unworkable for the large grammars that arise in natural language processing.
We treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes.
arXiv Detail & Related papers (2023-07-06T13:33:59Z) - On Parsing as Tagging [66.31276017088477]
We show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing.
We empirically evaluate our taxonomy of tagging pipelines with different choices of linearizers, learners, and decoders.
arXiv Detail & Related papers (2022-11-14T13:37:07Z) - Learning of Structurally Unambiguous Probabilistic Grammars [7.347989843033034]
We show that a CMTA can be converted into a probabilistic grammar.
We show that the learned CMTA can be converted into a complete algorithm for learning a structurally unambiguous probabilistic context free grammar.
arXiv Detail & Related papers (2022-03-17T17:01:51Z) - Learning grammar with a divide-and-concur neural network [4.111899441919164]
We implement a divide-and-concur iterative projection approach to context-free grammar inference.
Our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable.
arXiv Detail & Related papers (2022-01-18T22:42:43Z) - VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework.
We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z) - Learning of Structurally Unambiguous Probabilistic Grammars [7.347989843033034]
We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG)
We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a learning algorithm that learns CMTAs.
We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.
arXiv Detail & Related papers (2020-11-15T08:07:04Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.