Related papers: Differentiable Tree Operations Promote Compositional Generalization

Differentiable Tree Operations Promote Compositional Generalization

URL: http://arxiv.org/abs/2306.00751v1
Date: Thu, 1 Jun 2023 14:46:34 GMT
Title: Differentiable Tree Operations Promote Compositional Generalization
Authors: Paul Soulos, Edward Hu, Kate McCurdy, Yunmo Chen, Roland Fernandez, Paul Smolensky, Jianfeng Gao
Abstract summary: Differentiable Tree Machine (DTM) architecture integrates interpreter with external memory and agent that learns to sequentially select tree operations. DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%.
Score: 106.59434079287661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Differentiable Tree Machine (DTM) architecture that integrates our interpreter with an external memory and an agent that learns to sequentially select tree operations to execute the target transformation in an end-to-end manner. With respect to out-of-distribution compositional generalization on synthetic semantic parsing and language generation tasks, DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%. DTM remains highly interpretable in addition to its perfect performance.

Related papers

UniSymNet: A Unified Symbolic Network Guided by Transformer [21.207141107201775]
We propose a Unified Symbolic Network that unifies nonlinear binary operators into nested unary operators.<n>UniSymNet shows high fitting accuracy, excellent symbolic solution rate, and relatively low expression complexity.
arXiv Detail & Related papers (2025-05-09T14:38:25Z)
Sneaking Syntax into Transformer Language Models with Tree Regularization [33.74552367356904]
Introducing syntactic inductive biases could unlock more robust and data-efficient learning in transformer language models. We introduce TreeReg, an auxiliary loss function that converts bracketing decisions from silver parses into a set of differentiableity constraints. TreeReg integrates seamlessly with the standard LM objective, requiring no architectural changes.
arXiv Detail & Related papers (2024-11-28T03:27:48Z)
Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training. Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking. Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z)
Terminating Differentiable Tree Experts [77.2443883991608]
We propose a neuro-symbolic Differentiable Tree Machine that learns tree operations using a combination of transformers and Representation Products. We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts. We additionally propose a new termination algorithm to provide the model the power to choose how many steps to make automatically.
arXiv Detail & Related papers (2024-07-02T08:45:38Z)
Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic Supervision [4.665860995185884]
We propose a new method dubbed tree-planting. Instead of explicitly generating syntactic structures, we "plant" trees into attention weights of unidirectional Transformer LMs. Tree-Planted Transformers inherit the training efficiency from SLMs without changing the inference efficiency of their underlying Transformer LMs.
arXiv Detail & Related papers (2024-02-20T03:37:24Z)
Characterizing Intrinsic Compositionality in Transformers with Tree Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input. We show that transformers for three different tasks become more treelike over the course of training. These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z)
FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding [8.004425059996963]
This paper proposes FASTTREES, a new general purpose neural module for fast sequence encoding. Our work explores the notion of parallel tree induction, i.e., imbuing our model with hierarchical inductive biases in a parallelizable, non-autoregressive fashion. We show that the FASTTREES module can be applied to enhance Transformer models, achieving performance gains on three sequence tasks.
arXiv Detail & Related papers (2021-11-28T03:08:06Z)
Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages. We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves. We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z)
Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP [34.74181162627023]
In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.
arXiv Detail & Related papers (2020-09-16T01:30:21Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.