Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for
Grammar Induction and Text Representation
- URL: http://arxiv.org/abs/2203.00281v1
- Date: Tue, 1 Mar 2022 07:54:44 GMT
- Title: Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for
Grammar Induction and Text Representation
- Authors: Xiang Hu, Haitao Mi, Liang Li, Gerard de Melo
- Abstract summary: We propose a model-based pruning method, which also enables parallel encoding during inference.
Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and competitive results in downstream classification tasks.
- Score: 41.51966652141165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently CKY-based models show great potential in unsupervised grammar
induction thanks to their human-like encoding paradigm, which runs recursively
and hierarchically, but requires $O(n^3)$ time-complexity. Recursive
Transformer based on Differentiable Trees (R2D2) makes it possible to scale to
large language model pre-training even with complex tree encoder by introducing
a heuristic pruning method. However, the rule-based pruning approach suffers
from local optimum and slow inference issues. In this paper, we fix those
issues in a unified method. We propose to use a top-down parser as a
model-based pruning method, which also enables parallel encoding during
inference. Typically, our parser casts parsing as a split point scoring task,
which first scores all split points for a given sentence, and then recursively
splits a span into two by picking a split point with the highest score in the
current span. The reverse order of the splits is considered as the order of
pruning in R2D2 encoder. Beside the bi-directional language model loss, we also
optimize the parser by minimizing the KL distance between tree probabilities
from parser and R2D2. Our experiments show that our Fast-R2D2 improves
performance significantly in grammar induction and achieves competitive results
in downstream classification tasks.
Related papers
- Structured Dialogue Discourse Parsing [79.37200787463917]
discourse parsing aims to uncover the internal structure of a multi-participant conversation.
We propose a principled method that improves upon previous work from two perspectives: encoding and decoding.
Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni.
arXiv Detail & Related papers (2023-06-26T22:51:01Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via
Intent Conditioning [11.307865386100993]
We propose a novel NAR semantic that introduces intent conditioning on the decoder.
As the top-level intent governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search.
We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2.
arXiv Detail & Related papers (2022-04-14T04:06:39Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - R2D2: Recursive Transformer based on Differentiable Tree for
Interpretable Hierarchical Language Modeling [36.61173494449218]
This paper proposes a model based on differentiable CKY style binary trees to emulate the composition process.
We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes.
To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps.
arXiv Detail & Related papers (2021-07-02T11:00:46Z) - Recursive Tree Grammar Autoencoders [3.791857415239352]
We propose a novel autoencoder approach that encodes trees via a bottom-up grammar and decodes trees via a tree grammar.
We show experimentally that our proposed method improves the autoencoding error, training time, and optimization score on four benchmark datasets.
arXiv Detail & Related papers (2020-12-03T17:37:25Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - Learning Binary Decision Trees by Argmin Differentiation [34.9154848754842]
We learn binary decision trees that partition data for some downstream task.
We do so by relaxing a mixed-integer program for the discrete parameters.
We derive customized algorithms to efficiently compute the forward and backward passes.
arXiv Detail & Related papers (2020-10-09T15:11:28Z) - Span-based Semantic Parsing for Compositional Generalization [53.24255235340056]
SpanBasedSP predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input.
On GeoQuery, SCAN and CLOSURE, SpanBasedSP performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance compared to baselines on splits that require compositional generalization.
arXiv Detail & Related papers (2020-09-13T16:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.