Discovering Textual Structures: Generative Grammar Induction using
Template Trees
- URL: http://arxiv.org/abs/2009.04530v1
- Date: Wed, 9 Sep 2020 19:31:04 GMT
- Title: Discovering Textual Structures: Generative Grammar Induction using
Template Trees
- Authors: Thomas Winters, Luc De Raedt
- Abstract summary: We introduce a novel grammar induction algorithm for learning interpretable grammars for generative purposes, called Gitta.
By using existing human-created grammars, we found that the algorithm can reasonably approximate these grammars using only a few examples.
- Score: 17.37350034483191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language generation provides designers with methods for automatically
generating text, e.g. for creating summaries, chatbots and game content. In
practise, text generators are often either learned and hard to interpret, or
created by hand using techniques such as grammars and templates. In this paper,
we introduce a novel grammar induction algorithm for learning interpretable
grammars for generative purposes, called Gitta. We also introduce the novel
notion of template trees to discover latent templates in corpora to derive
these generative grammars. By using existing human-created grammars, we found
that the algorithm can reasonably approximate these grammars using only a few
examples. These results indicate that Gitta could be used to automatically
learn interpretable and easily modifiable grammars, and thus provide a stepping
stone for human-machine co-creation of generative models.
Related papers
- Explicit Syntactic Guidance for Neural Text Generation [45.60838824233036]
Generative Grammar suggests that humans generate natural language texts by learning language grammar.
We propose a syntax-guided generation schema, which generates the sequence guided by a constituency parse tree in a top-down direction.
Experiments on paraphrase generation and machine translation show that the proposed method outperforms autoregressive baselines.
arXiv Detail & Related papers (2023-06-20T12:16:31Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge.
We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.
We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - Learning grammar with a divide-and-concur neural network [4.111899441919164]
We implement a divide-and-concur iterative projection approach to context-free grammar inference.
Our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable.
arXiv Detail & Related papers (2022-01-18T22:42:43Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Shape Inference and Grammar Induction for Example-based Procedural
Generation [12.789308303237277]
We propose SIGI, a novel method for inferring shapes and inducing a shape grammar from grid-based 3D building examples.
Applied to Minecraft buildings, we show how the shape grammar can be used to automatically generate new buildings in a similar style.
arXiv Detail & Related papers (2021-09-21T14:41:56Z) - GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for
Language Education [7.517366022163375]
We present GrammarTagger, an open-source grammar profiler which, given an input text, identifies grammatical features useful for language education.
The model architecture enables it to learn from a small amount of texts annotated with spans and their labels.
We also build Octanove Learn, a search engine of language learning materials indexed by their reading difficulty and grammatical features.
arXiv Detail & Related papers (2021-04-07T15:31:20Z) - VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework.
We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Traduction des Grammaires Cat\'egorielles de Lambek dans les Grammaires
Cat\'egorielles Abstraites [0.0]
This internship report is to demonstrate that every Lambek Grammar can be, not entirely but efficiently, expressed in Abstract Categorial Grammars (ACG)
The main idea is to transform the type rewriting system of LGs into that of Context-Free Grammars (CFG) by erasing introduction and elimination rules and generating enough axioms so that the cut rule suffices.
Although the underlying algorithm was not fully implemented, this proof provides another argument in favour of the relevance of ACGs in Natural Language Processing.
arXiv Detail & Related papers (2020-01-23T18:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.