ORCHARD: A Benchmark For Measuring Systematic Generalization of
Multi-Hierarchical Reasoning
- URL: http://arxiv.org/abs/2111.14034v1
- Date: Sun, 28 Nov 2021 03:11:37 GMT
- Title: ORCHARD: A Benchmark For Measuring Systematic Generalization of
Multi-Hierarchical Reasoning
- Authors: Bill Tuck Weng Pung, Alvin Chan
- Abstract summary: We show that Transformer and LSTM models surprisingly fail in systematic generalization.
We also show that with increased references between hierarchies, Transformer performs no better than random.
- Score: 8.004425059996963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to reason with multiple hierarchical structures is an attractive
and desirable property of sequential inductive biases for natural language
processing. Do the state-of-the-art Transformers and LSTM architectures
implicitly encode for these biases? To answer this, we propose ORCHARD, a
diagnostic dataset for systematically evaluating hierarchical reasoning in
state-of-the-art neural sequence models. While there have been prior evaluation
frameworks such as ListOps or Logical Inference, our work presents a novel and
more natural setting where our models learn to reason with multiple explicit
hierarchical structures instead of only one, i.e., requiring the ability to do
both long-term sequence memorizing, relational reasoning while reasoning with
hierarchical structure. Consequently, backed by a set of rigorous experiments,
we show that (1) Transformer and LSTM models surprisingly fail in systematic
generalization, and (2) with increased references between hierarchies,
Transformer performs no better than random.
Related papers
- Integrating Hierarchical Semantic into Iterative Generation Model for Entailment Tree Explanation [7.5496857647335585]
We propose an architecture of integrating the Hierarchical Semantics of sentences under the framework of Controller-Generator (HiSCG) to explain answers.
The proposed method achieves comparable performance on all three settings of the EntailmentBank dataset.
arXiv Detail & Related papers (2024-09-26T11:46:58Z) - Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - Inducing Systematicity in Transformers by Attending to Structurally
Quantized Embeddings [60.698130703909804]
Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset.
We propose SQ-Transformer that explicitly encourages systematicity in the embeddings and attention layers.
We show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets.
arXiv Detail & Related papers (2024-02-09T15:53:15Z) - Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition
and Constraints [30.219318352970948]
We study two low-rank variants of Neural QCFG for faster inference.
We introduce two soft constraints over tree hierarchy and source coverage.
We find that our models outperform vanilla Neural QCFG in most settings.
arXiv Detail & Related papers (2023-06-05T08:05:05Z) - Query Structure Modeling for Inductive Logical Reasoning Over Knowledge
Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs.
It encodes linearized query structures and entities using pre-trained language models to find answers.
We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for
Sequences [16.59989033959959]
We describe an efficient hierarchical method to compute attention in the Transformer architecture.
Our method is superior to alternative sub-quadratic proposals by over +6 points on average on the Long Range Arena benchmark.
It also sets a new SOTA test perplexity on One-Billion Word dataset with 5x fewer model parameters than that of the previous-best Transformer-based models.
arXiv Detail & Related papers (2021-07-25T23:07:03Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z) - Does syntax need to grow on trees? Sources of hierarchical inductive
bias in sequence-to-sequence networks [28.129220683169052]
In neural network models, inductive biases could in theory arise from any aspect of the model architecture.
We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
arXiv Detail & Related papers (2020-01-10T19:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.