Differentiable Tree Operations Promote Compositional Generalization
- URL: http://arxiv.org/abs/2306.00751v1
- Date: Thu, 1 Jun 2023 14:46:34 GMT
- Title: Differentiable Tree Operations Promote Compositional Generalization
- Authors: Paul Soulos, Edward Hu, Kate McCurdy, Yunmo Chen, Roland Fernandez,
Paul Smolensky, Jianfeng Gao
- Abstract summary: Differentiable Tree Machine (DTM) architecture integrates interpreter with external memory and agent that learns to sequentially select tree operations.
DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%.
- Score: 106.59434079287661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of structure-to-structure transformation tasks, learning
sequences of discrete symbolic operations poses significant challenges due to
their non-differentiability. To facilitate the learning of these symbolic
sequences, we introduce a differentiable tree interpreter that compiles
high-level symbolic tree operations into subsymbolic matrix operations on
tensors. We present a novel Differentiable Tree Machine (DTM) architecture that
integrates our interpreter with an external memory and an agent that learns to
sequentially select tree operations to execute the target transformation in an
end-to-end manner. With respect to out-of-distribution compositional
generalization on synthetic semantic parsing and language generation tasks, DTM
achieves 100% while existing baselines such as Transformer, Tree Transformer,
LSTM, and Tree2Tree LSTM achieve less than 30%. DTM remains highly
interpretable in addition to its perfect performance.
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Terminating Differentiable Tree Experts [77.2443883991608]
We propose a neuro-symbolic Differentiable Tree Machine that learns tree operations using a combination of transformers and Representation Products.
We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts.
We additionally propose a new termination algorithm to provide the model the power to choose how many steps to make automatically.
arXiv Detail & Related papers (2024-07-02T08:45:38Z) - Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic Supervision [4.665860995185884]
We propose a new method dubbed tree-planting.
Instead of explicitly generating syntactic structures, we "plant" trees into attention weights of unidirectional Transformer LMs.
Tree-Planted Transformers inherit the training efficiency from SLMs without changing the inference efficiency of their underlying Transformer LMs.
arXiv Detail & Related papers (2024-02-20T03:37:24Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding [8.004425059996963]
This paper proposes FASTTREES, a new general purpose neural module for fast sequence encoding.
Our work explores the notion of parallel tree induction, i.e., imbuing our model with hierarchical inductive biases in a parallelizable, non-autoregressive fashion.
We show that the FASTTREES module can be applied to enhance Transformer models, achieving performance gains on three sequence tasks.
arXiv Detail & Related papers (2021-11-28T03:08:06Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - Mimic and Conquer: Heterogeneous Tree Structure Distillation for
Syntactic NLP [34.74181162627023]
In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder.
Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.
arXiv Detail & Related papers (2020-09-16T01:30:21Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.