R2D2: Recursive Transformer based on Differentiable Tree for
Interpretable Hierarchical Language Modeling
- URL: http://arxiv.org/abs/2107.00967v1
- Date: Fri, 2 Jul 2021 11:00:46 GMT
- Title: R2D2: Recursive Transformer based on Differentiable Tree for
Interpretable Hierarchical Language Modeling
- Authors: Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard
de Melo
- Abstract summary: This paper proposes a model based on differentiable CKY style binary trees to emulate the composition process.
We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes.
To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps.
- Score: 36.61173494449218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human language understanding operates at multiple levels of granularity
(e.g., words, phrases, and sentences) with increasing levels of abstraction
that can be hierarchically combined. However, existing deep models with stacked
layers do not explicitly model any sort of hierarchical process. This paper
proposes a recursive Transformer model based on differentiable CKY style binary
trees to emulate the composition process. We extend the bidirectional language
model pre-training objective to this architecture, attempting to predict each
word given its left and right abstraction nodes. To scale up our approach, we
also introduce an efficient pruned tree induction algorithm to enable encoding
in just a linear number of composition steps. Experimental results on language
modeling and unsupervised parsing show the effectiveness of our approach.
Related papers
- Integrating Hierarchical Semantic into Iterative Generation Model for Entailment Tree Explanation [7.5496857647335585]
We propose an architecture of integrating the Hierarchical Semantics of sentences under the framework of Controller-Generator (HiSCG) to explain answers.
The proposed method achieves comparable performance on all three settings of the EntailmentBank dataset.
arXiv Detail & Related papers (2024-09-26T11:46:58Z) - An Expression Tree Decoding Strategy for Mathematical Equation
Generation [24.131972875875952]
Existing approaches can be broadly categorized into token-level and expression-level generation.
Expression-level methods generate each expression one by one.
Each expression represents a solving step, and there naturally exist parallel or dependent relations between these steps.
We integrate tree structure into the expression-level generation and advocate an expression tree decoding strategy.
arXiv Detail & Related papers (2023-10-14T17:00:28Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for
Grammar Induction and Text Representation [41.51966652141165]
We propose a model-based pruning method, which also enables parallel encoding during inference.
Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and competitive results in downstream classification tasks.
arXiv Detail & Related papers (2022-03-01T07:54:44Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.