Learning of Structurally Unambiguous Probabilistic Grammars
- URL: http://arxiv.org/abs/2011.07472v2
- Date: Tue, 9 Mar 2021 18:32:52 GMT
- Title: Learning of Structurally Unambiguous Probabilistic Grammars
- Authors: Dolav Nitay, Dana Fisman, Michal Ziv-Ukelson
- Abstract summary: We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG)
We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a learning algorithm that learns CMTAs.
We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.
- Score: 7.347989843033034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The problem of identifying a probabilistic context free grammar has two
aspects: the first is determining the grammar's topology (the rules of the
grammar) and the second is estimating probabilistic weights for each rule.
Given the hardness results for learning context-free grammars in general, and
probabilistic grammars in particular, most of the literature has concentrated
on the second problem. In this work we address the first problem. We restrict
attention to structurally unambiguous weighted context-free grammars (SUWCFG)
and provide a query learning algorithm for structurally unambiguous
probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be
represented using co-linear multiplicity tree automata (CMTA), and provide a
polynomial learning algorithm that learns CMTAs. We show that the learned CMTA
can be converted into a probabilistic grammar, thus providing a complete
algorithm for learning a structurally unambiguous probabilistic context free
grammar (both the grammar topology and the probabilistic weights) using
structured membership queries and structured equivalence queries. We
demonstrate the usefulness of our algorithm in learning PCFGs over genomic
data.
Related papers
- Detecting and explaining (in)equivalence of context-free grammars [0.6282171844772422]
We propose a scalable framework for deciding, proving, and explaining (in)equivalence of context-free grammars.
We present an implementation of the framework and evaluate it on large data sets collected within educational support systems.
arXiv Detail & Related papers (2024-07-25T17:36:18Z) - Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks [6.390468088226495]
We propose a new method to extract and explore significant fine-grained grammar patterns from treebanks.
We extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order.
Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.
arXiv Detail & Related papers (2024-03-26T09:39:53Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge.
We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.
We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Logical Entity Representation in Knowledge-Graphs for Differentiable
Rule Learning [71.05093203007357]
We propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph.
A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph.
Our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods.
arXiv Detail & Related papers (2023-05-22T05:59:22Z) - Unsupervised Discontinuous Constituency Parsing with Mildly
Context-Sensitive Grammars [14.256041558454786]
We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing.
Our approach fixes the rule structure in advance and focuses on parameter learning with maximum likelihood.
Experiments on German and Dutch show that our approach is able to induce linguistically meaningful trees with continuous and discontinuous structures.
arXiv Detail & Related papers (2022-12-18T18:10:45Z) - Learning of Structurally Unambiguous Probabilistic Grammars [7.347989843033034]
We show that a CMTA can be converted into a probabilistic grammar.
We show that the learned CMTA can be converted into a complete algorithm for learning a structurally unambiguous probabilistic context free grammar.
arXiv Detail & Related papers (2022-03-17T17:01:51Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z) - VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework.
We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z) - Probabilistic Grammars for Equation Discovery [0.0]
We propose the use of probabilistic context-free grammars in equation discovery.
Probability grammars can be used to elegantly and flexibly formulate the parsimony principle.
arXiv Detail & Related papers (2020-12-01T11:59:19Z) - A Constraint-Based Algorithm for the Structural Learning of
Continuous-Time Bayesian Networks [70.88503833248159]
We propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks.
We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence.
arXiv Detail & Related papers (2020-07-07T07:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.