Structural generalization is hard for sequence-to-sequence models
- URL: http://arxiv.org/abs/2210.13050v1
- Date: Mon, 24 Oct 2022 09:03:03 GMT
- Title: Structural generalization is hard for sequence-to-sequence models
- Authors: Yuekun Yao and Alexander Koller
- Abstract summary: Sequence-to-sequence (seq2seq) models have been successful across many NLP tasks.
Recent work on compositional generalization has shown that seq2seq models achieve very low accuracy in generalizing to linguistic structures that were not seen in training.
- Score: 85.0087839979613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence (seq2seq) models have been successful across many NLP
tasks, including ones that require predicting linguistic structure. However,
recent work on compositional generalization has shown that seq2seq models
achieve very low accuracy in generalizing to linguistic structures that were
not seen in training. We present new evidence that this is a general limitation
of seq2seq models that is present not just in semantic parsing, but also in
syntactic parsing and in text-to-text tasks, and that this limitation can often
be overcome by neurosymbolic models that have linguistic knowledge built in. We
further report on some experiments that give initial answers on the reasons for
these limitations.
Related papers
- SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive
Bias to Sequence-to-sequence Models [23.21767225871304]
Sequence-to-sequence (seq2seq) models often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations.
We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not.
arXiv Detail & Related papers (2022-03-17T15:46:53Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - Discontinuous Grammar as a Foreign Language [0.7412445894287709]
We extend the framework of sequence-to-sequence models for constituent parsing.
We design several novelizations that can fully produce discontinuities.
For the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks.
arXiv Detail & Related papers (2021-10-20T08:58:02Z) - Unlocking Compositional Generalization in Pre-trained Models Using
Intermediate Representations [27.244943870086175]
Sequence-to-sequence (seq2seq) models have been found to struggle at out-of-distribution compositional generalization.
We study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models.
arXiv Detail & Related papers (2021-04-15T14:15:14Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.