Improving Compositional Generalization with Latent Structure and Data
Augmentation
- URL: http://arxiv.org/abs/2112.07610v1
- Date: Tue, 14 Dec 2021 18:03:28 GMT
- Title: Improving Compositional Generalization with Latent Structure and Data
Augmentation
- Authors: Linlu Qiu, Peter Shaw, Panupong Pasupat, Pawe{\l} Krzysztof Nowak, Tal
Linzen, Fei Sha, Kristina Toutanova
- Abstract summary: We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL)
CSL is a generative model with a quasi-synchronous context-free grammar backbone.
This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks.
- Score: 39.24527889685699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generic unstructured neural networks have been shown to struggle on
out-of-distribution compositional generalization. Compositional data
augmentation via example recombination has transferred some prior knowledge
about compositionality to such black-box neural models for several semantic
parsing tasks, but this often required task-specific engineering or provided
limited gains.
We present a more powerful data recombination method using a model called
Compositional Structure Learner (CSL). CSL is a generative model with a
quasi-synchronous context-free grammar backbone, which we induce from the
training data. We sample recombined examples from CSL and add them to the
fine-tuning data of a pre-trained sequence-to-sequence model (T5). This
procedure effectively transfers most of CSL's compositional bias to T5 for
diagnostic tasks, and results in a model even stronger than a T5-CSL ensemble
on two real world compositional generalization tasks. This results in new
state-of-the-art performance for these challenging semantic parsing tasks
requiring generalization to both natural language variation and novel
compositions of elements.
Related papers
- Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks.
Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Compositional Generalisation with Structured Reordering and Fertility
Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation.
We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z) - Compositionality as Lexical Symmetry [42.37422271002712]
In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets.
We present a domain-general and model-agnostic formulation of compositionality as a constraint on symmetries of data distributions rather than models.
We describe a procedure called LEXSYM that discovers these transformations automatically, then applies them to training data for ordinary neural sequence models.
arXiv Detail & Related papers (2022-01-30T21:44:46Z) - Learning to Generalize Compositionally by Transferring Across Semantic
Parsing Tasks [37.66114618645146]
We investigate learning representations that facilitate transfer learning from one compositional task to another.
We apply this method to semantic parsing, using three very different datasets.
Our method significantly improves compositional generalization over baselines on the test set of the target task.
arXiv Detail & Related papers (2021-11-09T09:10:21Z) - Improving Compositional Generalization with Self-Training for
Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks.
By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures.
We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Sequence-Level Mixed Sample Data Augmentation [119.94667752029143]
This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems.
Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set.
arXiv Detail & Related papers (2020-11-18T02:18:04Z) - Learning to Recombine and Resample Data for Compositional Generalization [35.868789086531685]
We describe R&R, a learned data augmentation scheme that enables a large category of compositional generalizations without appeal to latent symbolic structure.
R&R has two components: recombination of original training examples via a prototype-based generative model and resampling of generated examples to encourage extrapolation.
arXiv Detail & Related papers (2020-10-08T00:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.