Related papers: Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks

Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks

URL: http://arxiv.org/abs/2001.03632v1
Date: Fri, 10 Jan 2020 19:02:52 GMT
Title: Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks
Authors: R. Thomas McCoy, Robert Frank, Tal Linzen
Abstract summary: In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks.
Score: 28.129220683169052
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. All architectural factors that we investigated qualitatively affected how models generalized, including factors with no clear connection to hierarchical structure. For example, LSTMs and GRUs displayed qualitatively different inductive biases. However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like syntactic generalization requires architectural syntactic structure.

Related papers

Propositional Logic for Probing Generalization in Neural Networks [3.6037930269014633]
We investigate the generalization behavior of three key neural architectures (Transformers, Graph Convolution Networks and LSTMs) in a controlled task rooted in propositional logic.<n>We find thatTransformers fail to apply negation compositionally, unless structural biases are introduced.<n>Our findings highlight persistent limitations in the ability of standard architectures to learn systematic representations of logical operators.
arXiv Detail & Related papers (2025-06-10T16:46:05Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
When does compositional structure yield compositional generalization? A kernel theory [0.0]
We present a theory of compositional generalization in kernel models with fixed representations. We identify novel failure modes in compositional generalization that arise from biases in the training data. This work provides a theoretical perspective on how statistical structure in the training data can affect compositional generalization.
arXiv Detail & Related papers (2024-05-26T00:50:11Z)
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures. We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z)
On Provable Length and Compositional Generalization [7.883808173871223]
We provide first provable guarantees on length and compositional generalization for common sequence-to-sequence models. We show that emphsimple limited capacity versions of these different architectures achieve both length and compositional generalization.
arXiv Detail & Related papers (2024-02-07T14:16:28Z)
SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions. Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented. We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z)
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases [28.58785395946639]
We show that pre-training can teach language models to rely on hierarchical syntactic features when performing tasks after fine-tuning. We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus.
arXiv Detail & Related papers (2023-05-31T14:38:14Z)
How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech [25.02822854434971]
We train LSTMs and Transformers on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We find that both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
arXiv Detail & Related papers (2023-01-26T23:24:17Z)
Compositional Generalisation with Structured Reordering and Fertility Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation. We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus. We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z)
ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning [8.004425059996963]
We show that Transformer and LSTM models surprisingly fail in systematic generalization. We also show that with increased references between hierarchies, Transformer performs no better than random.
arXiv Detail & Related papers (2021-11-28T03:11:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.