Recursive Top-Down Production for Sentence Generation with Latent Trees
- URL: http://arxiv.org/abs/2010.04704v1
- Date: Fri, 9 Oct 2020 17:47:16 GMT
- Title: Recursive Top-Down Production for Sentence Generation with Latent Trees
- Authors: Shawn Tan and Yikang Shen and Timothy J. O'Donnell and Alessandro
Sordoni and Aaron Courville
- Abstract summary: We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
- Score: 77.56794870399288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We model the recursive production property of context-free grammars for
natural and synthetic languages. To this end, we present a dynamic programming
algorithm that marginalises over latent binary tree structures with $N$ leaves,
allowing us to compute the likelihood of a sequence of $N$ tokens under a
latent tree model, which we maximise to train a recursive neural function. We
demonstrate performance on two synthetic tasks: SCAN (Lake and Baroni, 2017),
where it outperforms previous models on the LENGTH split, and English question
formation (McCoy et al., 2020), where it performs comparably to decoders with
the ground-truth tree structure. We also present experimental results on
German-English translation on the Multi30k dataset (Elliott et al., 2016), and
qualitatively analyse the induced tree structures our model learns for the SCAN
tasks and the German-English translation task.
Related papers
- Recursive Speculative Decoding: Accelerating LLM Inference via Sampling
Without Replacement [11.91629418177851]
Speculative decoding is an inference-accel method for large language models.
Recent works have advanced this method by establishing a draft-token tree.
We present Recursive Speculative Decoding (RSD), a novel tree-based method that samples draft tokens without replacement.
arXiv Detail & Related papers (2024-02-21T22:57:49Z) - Differentiable Tree Operations Promote Compositional Generalization [106.59434079287661]
Differentiable Tree Machine (DTM) architecture integrates interpreter with external memory and agent that learns to sequentially select tree operations.
DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%.
arXiv Detail & Related papers (2023-06-01T14:46:34Z) - Structure-Unified M-Tree Coding Solver for MathWord Problem [57.825176412485504]
In previous work, models designed by taking into account the properties of the binary tree structure of mathematical expressions at the output side have achieved better performance.
In this paper, we propose the Structure-Unified M-Tree Coding Coding (S-UMCr), which applies a tree with any M branches (M-tree) to unify the output structures.
Experimental results on the widely used MAWPS and Math23K datasets have demonstrated that SUMC-r not only outperforms several state-of-the-art models but also performs much better under low-resource conditions.
arXiv Detail & Related papers (2022-10-22T12:20:36Z) - Learning Tree Structures from Leaves For Particle Decay Reconstruction [0.0]
We present a neural approach to reconstructing rooted tree graphs describing hierarchical interactions, using a novel representation we term the Lowest Common Ancestor Generations (LCAG) matrix.
We are able to correctly predict the LCAG purely from leaf features for a maximum tree-depth of $8$ in $92.5%$ of cases for trees up to $6$ leaves (including) and $59.7%$ for trees up to $10$ in our simulated dataset.
arXiv Detail & Related papers (2022-08-31T15:36:47Z) - FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding [8.004425059996963]
This paper proposes FASTTREES, a new general purpose neural module for fast sequence encoding.
Our work explores the notion of parallel tree induction, i.e., imbuing our model with hierarchical inductive biases in a parallelizable, non-autoregressive fashion.
We show that the FASTTREES module can be applied to enhance Transformer models, achieving performance gains on three sequence tasks.
arXiv Detail & Related papers (2021-11-28T03:08:06Z) - R2D2: Recursive Transformer based on Differentiable Tree for
Interpretable Hierarchical Language Modeling [36.61173494449218]
This paper proposes a model based on differentiable CKY style binary trees to emulate the composition process.
We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes.
To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps.
arXiv Detail & Related papers (2021-07-02T11:00:46Z) - Can RNNs learn Recursive Nested Subject-Verb Agreements? [4.094098809740732]
Language processing requires the ability to extract nested tree structures.
Recent advances in Recurrent Neural Networks (RNNs) achieve near-human performance in some language tasks.
arXiv Detail & Related papers (2021-01-06T20:47:02Z) - Constructing Taxonomies from Pretrained Language Models [52.53846972667636]
We present a method for constructing taxonomic trees (e.g., WordNet) using pretrained language models.
Our approach is composed of two modules, one that predicts parenthood relations and another that reconciles those predictions into trees.
We train our model on subtrees sampled from WordNet, and test on non-overlapping WordNet subtrees.
arXiv Detail & Related papers (2020-10-24T07:16:21Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.