Unlocking Compositional Generalization in Pre-trained Models Using
Intermediate Representations
- URL: http://arxiv.org/abs/2104.07478v1
- Date: Thu, 15 Apr 2021 14:15:14 GMT
- Title: Unlocking Compositional Generalization in Pre-trained Models Using
Intermediate Representations
- Authors: Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong
Pasupat, Yuan Zhang
- Abstract summary: Sequence-to-sequence (seq2seq) models have been found to struggle at out-of-distribution compositional generalization.
We study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models.
- Score: 27.244943870086175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but
have been found to struggle at out-of-distribution compositional
generalization. While specialized model architectures and pre-training of
seq2seq models have been proposed to address this issue, the former often comes
at the cost of generality and the latter only shows limited success. In this
paper, we study the impact of intermediate representations on compositional
generalization in pre-trained seq2seq models, without changing the model
architecture at all, and identify key aspects for designing effective
representations. Instead of training to directly map natural language to an
executable form, we map to a reversible or lossy intermediate representation
that has stronger structural correspondence with natural language. The
combination of our proposed intermediate representations and pre-trained models
is surprisingly effective, where the best combinations obtain a new
state-of-the-art on CFQ (+14.8 accuracy points) and on the template-splits of
three text-to-SQL datasets (+15.0 to +19.4 accuracy points). This work
highlights that intermediate representations provide an important and
potentially overlooked degree of freedom for improving the compositional
generalization abilities of pre-trained seq2seq models.
Related papers
- Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - Enhancing Pre-trained Models with Text Structure Knowledge for Question
Generation [2.526624977753083]
We model text structure as answer position and syntactic dependency, and propose answer localness modeling and syntactic mask attention to address these limitations.
Experiments on SQuAD dataset show that our proposed two modules improve performance over the strong pre-trained model ProphetNet.
arXiv Detail & Related papers (2022-09-09T08:33:47Z) - Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive
Bias to Sequence-to-sequence Models [23.21767225871304]
Sequence-to-sequence (seq2seq) models often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations.
We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not.
arXiv Detail & Related papers (2022-03-17T15:46:53Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Compositional Generalization via Semantic Tagging [81.24269148865555]
We propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models.
We show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.
arXiv Detail & Related papers (2020-10-22T15:55:15Z) - Abstractive Summarization with Combination of Pre-trained
Sequence-to-Sequence and Saliency Models [11.420640383826656]
We investigate the effectiveness of combining saliency models that identify the important parts of the source text with pre-trained seq-to-seq models.
Most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets.
arXiv Detail & Related papers (2020-03-29T14:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.