Does syntax need to grow on trees? Sources of hierarchical inductive
bias in sequence-to-sequence networks
- URL: http://arxiv.org/abs/2001.03632v1
- Date: Fri, 10 Jan 2020 19:02:52 GMT
- Title: Does syntax need to grow on trees? Sources of hierarchical inductive
bias in sequence-to-sequence networks
- Authors: R. Thomas McCoy, Robert Frank, Tal Linzen
- Abstract summary: In neural network models, inductive biases could in theory arise from any aspect of the model architecture.
We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
- Score: 28.129220683169052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learners that are exposed to the same training data might generalize
differently due to differing inductive biases. In neural network models,
inductive biases could in theory arise from any aspect of the model
architecture. We investigate which architectural factors affect the
generalization behavior of neural sequence-to-sequence models trained on two
syntactic tasks, English question formation and English tense reinflection. For
both tasks, the training set is consistent with a generalization based on
hierarchical structure and a generalization based on linear order. All
architectural factors that we investigated qualitatively affected how models
generalized, including factors with no clear connection to hierarchical
structure. For example, LSTMs and GRUs displayed qualitatively different
inductive biases. However, the only factor that consistently contributed a
hierarchical bias across tasks was the use of a tree-structured model rather
than a model with sequential recurrence, suggesting that human-like syntactic
generalization requires architectural syntactic structure.
Related papers
- When does compositional structure yield compositional generalization? A kernel theory [0.0]
We present a theory of compositional generalization in kernel models with fixed representations.
We identify novel failure modes in compositional generalization that arise from biases in the training data.
This work provides a theoretical perspective on how statistical structure in the training data can affect compositional generalization.
arXiv Detail & Related papers (2024-05-26T00:50:11Z) - Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures.
We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z) - On Provable Length and Compositional Generalization [7.883808173871223]
We provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models.
We show that emphsimple limited capacity versions of these different architectures achieve both length and compositional generalization.
arXiv Detail & Related papers (2024-02-07T14:16:28Z) - SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - How poor is the stimulus? Evaluating hierarchical generalization in
neural networks trained on child-directed speech [25.02822854434971]
We train LSTMs and Transformers on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus.
We find that both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule.
These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
arXiv Detail & Related papers (2023-01-26T23:24:17Z) - Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - ORCHARD: A Benchmark For Measuring Systematic Generalization of
Multi-Hierarchical Reasoning [8.004425059996963]
We show that Transformer and LSTM models surprisingly fail in systematic generalization.
We also show that with increased references between hierarchies, Transformer performs no better than random.
arXiv Detail & Related papers (2021-11-28T03:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.