Related papers: Learning of Structurally Unambiguous Probabilistic Grammars

Learning of Structurally Unambiguous Probabilistic Grammars

URL: http://arxiv.org/abs/2011.07472v2
Date: Tue, 9 Mar 2021 18:32:52 GMT
Title: Learning of Structurally Unambiguous Probabilistic Grammars
Authors: Dolav Nitay, Dana Fisman, Michal Ziv-Ukelson
Abstract summary: We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a learning algorithm that learns CMTAs. We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.
Score: 7.347989843033034
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.

Related papers

Detecting and explaining (in)equivalence of context-free grammars [0.6282171844772422]
We propose a scalable framework for deciding, proving, and explaining (in)equivalence of context-free grammars. We present an implementation of the framework and evaluate it on large data sets collected within educational support systems.
arXiv Detail & Related papers (2024-07-25T17:36:18Z)
Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks [6.390468088226495]
We propose a new method to extract and explore significant fine-grained grammar patterns from treebanks. We extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order. Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.
arXiv Detail & Related papers (2024-03-26T09:39:53Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning [71.05093203007357]
We propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. Our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods.
arXiv Detail & Related papers (2023-05-22T05:59:22Z)
Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars [14.256041558454786]
We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing. Our approach fixes the rule structure in advance and focuses on parameter learning with maximum likelihood. Experiments on German and Dutch show that our approach is able to induce linguistically meaningful trees with continuous and discontinuous structures.
arXiv Detail & Related papers (2022-12-18T18:10:45Z)
Finding Dataset Shortcuts with Grammar Induction [85.47127659108637]
We propose to use probabilistic grammars to characterize and discover shortcuts in NLP datasets. Specifically, we use a context-free grammar to model patterns in sentence classification datasets and use a synchronous context-free grammar to model datasets involving sentence pairs. The resulting grammars reveal interesting shortcut features in a number of datasets, including both simple and high-level features.
arXiv Detail & Related papers (2022-10-20T19:54:11Z)
Learning of Structurally Unambiguous Probabilistic Grammars [7.347989843033034]
We show that a CMTA can be converted into a probabilistic grammar. We show that the learned CMTA can be converted into a complete algorithm for learning a structurally unambiguous probabilistic context free grammar.
arXiv Detail & Related papers (2022-03-17T17:01:51Z)
A Syntax-Guided Grammatical Error Correction Model with Dependency Tree Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences. We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees. We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z)
Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars. Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z)
VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework. We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z)
Probabilistic Grammars for Equation Discovery [0.0]
We propose the use of probabilistic context-free grammars in equation discovery. Probability grammars can be used to elegantly and flexibly formulate the parsimony principle.
arXiv Detail & Related papers (2020-12-01T11:59:19Z)
A Constraint-Based Algorithm for the Structural Learning of Continuous-Time Bayesian Networks [70.88503833248159]
We propose the first constraint-based algorithm for learning the structure of continuous-time Bayesian networks. We discuss the different statistical tests and the underlying hypotheses used by our proposal to establish conditional independence.
arXiv Detail & Related papers (2020-07-07T07:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.