Related papers: ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

URL: http://arxiv.org/abs/2111.14034v1
Date: Sun, 28 Nov 2021 03:11:37 GMT
Title: ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning
Authors: Bill Tuck Weng Pung, Alvin Chan
Abstract summary: We show that Transformer and LSTM models surprisingly fail in systematic generalization. We also show that with increased references between hierarchies, Transformer performs no better than random.
Score: 8.004425059996963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to reason with multiple hierarchical structures is an attractive and desirable property of sequential inductive biases for natural language processing. Do the state-of-the-art Transformers and LSTM architectures implicitly encode for these biases? To answer this, we propose ORCHARD, a diagnostic dataset for systematically evaluating hierarchical reasoning in state-of-the-art neural sequence models. While there have been prior evaluation frameworks such as ListOps or Logical Inference, our work presents a novel and more natural setting where our models learn to reason with multiple explicit hierarchical structures instead of only one, i.e., requiring the ability to do both long-term sequence memorizing, relational reasoning while reasoning with hierarchical structure. Consequently, backed by a set of rigorous experiments, we show that (1) Transformer and LSTM models surprisingly fail in systematic generalization, and (2) with increased references between hierarchies, Transformer performs no better than random.

Related papers

Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
Integrating Hierarchical Semantic into Iterative Generation Model for Entailment Tree Explanation [7.5496857647335585]
We propose an architecture of integrating the Hierarchical Semantics of sentences under the framework of Controller-Generator (HiSCG) to explain answers. The proposed method achieves comparable performance on all three settings of the EntailmentBank dataset.
arXiv Detail & Related papers (2024-09-26T11:46:58Z)
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures. We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z)
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings [60.698130703909804]
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset. We propose SQ-Transformer that explicitly encourages systematicity in the embeddings and attention layers. We show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets.
arXiv Detail & Related papers (2024-02-09T15:53:15Z)
Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition and Constraints [30.219318352970948]
We study two low-rank variants of Neural QCFG for faster inference. We introduce two soft constraints over tree hierarchy and source coverage. We find that our models outperform vanilla Neural QCFG in most settings.
arXiv Detail & Related papers (2023-06-05T08:05:05Z)
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs. It encodes linearized query structures and entities using pre-trained language models to find answers. We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences [16.59989033959959]
We describe an efficient hierarchical method to compute attention in the Transformer architecture. Our method is superior to alternative sub-quadratic proposals by over +6 points on average on the Long Range Arena benchmark. It also sets a new SOTA test perplexity on One-Billion Word dataset with 5x fewer model parameters than that of the previous-best Transformer-based models.
arXiv Detail & Related papers (2021-07-25T23:07:03Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks [28.129220683169052]
In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
arXiv Detail & Related papers (2020-01-10T19:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.