Related papers: R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

URL: http://arxiv.org/abs/2107.00967v1
Date: Fri, 2 Jul 2021 11:00:46 GMT
Title: R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling
Authors: Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, Jing Zheng, Gerard de Melo
Abstract summary: This paper proposes a model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps.
Score: 36.61173494449218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

Related papers

Integrating Hierarchical Semantic into Iterative Generation Model for Entailment Tree Explanation [7.5496857647335585]
We propose an architecture of integrating the Hierarchical Semantics of sentences under the framework of Controller-Generator (HiSCG) to explain answers. The proposed method achieves comparable performance on all three settings of the EntailmentBank dataset.
arXiv Detail & Related papers (2024-09-26T11:46:58Z)
An Expression Tree Decoding Strategy for Mathematical Equation Generation [24.131972875875952]
Existing approaches can be broadly categorized into token-level and expression-level generation. Expression-level methods generate each expression one by one. Each expression represents a solving step, and there naturally exist parallel or dependent relations between these steps. We integrate tree structure into the expression-level generation and advocate an expression tree decoding strategy.
arXiv Detail & Related papers (2023-10-14T17:00:28Z)
Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z)
Characterizing Intrinsic Compositionality in Transformers with Tree Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input. We show that transformers for three different tasks become more treelike over the course of training. These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z)
Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation [41.51966652141165]
We propose a model-based pruning method, which also enables parallel encoding during inference. Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and competitive results in downstream classification tasks.
arXiv Detail & Related papers (2022-03-01T07:54:44Z)
Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages. We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves. We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z)
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances" Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.