Compositionality as Lexical Symmetry
- URL: http://arxiv.org/abs/2201.12926v2
- Date: Wed, 5 Jul 2023 17:59:33 GMT
- Title: Compositionality as Lexical Symmetry
- Authors: Ekin Aky\"urek and Jacob Andreas
- Abstract summary: In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets.
We present a domain-general and model-agnostic formulation of compositionality as a constraint on symmetries of data distributions rather than models.
We describe a procedure called LEXSYM that discovers these transformations automatically, then applies them to training data for ordinary neural sequence models.
- Score: 42.37422271002712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In tasks like semantic parsing, instruction following, and question
answering, standard deep networks fail to generalize compositionally from small
datasets. Many existing approaches overcome this limitation with model
architectures that enforce a compositional process of sentence interpretation.
In this paper, we present a domain-general and model-agnostic formulation of
compositionality as a constraint on symmetries of data distributions rather
than models. Informally, we prove that whenever a task can be solved by a
compositional model, there is a corresponding data augmentation scheme -- a
procedure for transforming examples into other well formed examples -- that
imparts compositional inductive bias on any model trained to solve the same
task. We describe a procedure called LEXSYM that discovers these
transformations automatically, then applies them to training data for ordinary
neural sequence models. Unlike existing compositional data augmentation
procedures, LEXSYM can be deployed agnostically across text, structured data,
and even images. It matches or surpasses state-of-the-art, task-specific models
on COGS semantic parsing, SCAN and ALCHEMY instruction following, and
CLEVR-COGENT visual question answering datasets.
Related papers
- Visual Analytics for Fine-grained Text Classification Models and Datasets [3.6873612681664016]
SemLa is a novel visual analytics system tailored for fine-grained text classification.
This paper details the iterative design study and the resulting innovations featured in SemLa.
arXiv Detail & Related papers (2024-03-21T17:26:28Z) - DiSK: A Diffusion Model for Structured Knowledge [12.472921856815942]
Diffusion Models of Structured Knowledge (DiSK) is a new architecture and training approach specialized for structured data.
DiSK handles text, categorical, and continuous numerical data using a Gaussian mixture model approach.
arXiv Detail & Related papers (2023-12-08T18:59:14Z) - On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.
We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z) - Conjunct Resolution in the Face of Verbal Omissions [51.220650412095665]
We propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
We curate a large dataset, containing over 10K examples of naturally-occurring verbal omissions with crowd-sourced annotations.
We train various neural baselines for this task, and show that while our best method obtains decent performance, it leaves ample space for improvement.
arXiv Detail & Related papers (2023-05-26T08:44:02Z) - Recursive Neural Networks with Bottlenecks Diagnose
(Non-)Compositionality [65.60002535580298]
Quantifying compositionality of data is a challenging task, which has been investigated primarily for short utterances.
We show that comparing data's representations in models with and without a bottleneck can be used to produce a compositionality metric.
The procedure is applied to the evaluation of arithmetic expressions using synthetic data, and sentiment classification using natural language data.
arXiv Detail & Related papers (2023-01-31T15:46:39Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Improving Compositional Generalization with Latent Structure and Data
Augmentation [39.24527889685699]
We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL)
CSL is a generative model with a quasi-synchronous context-free grammar backbone.
This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks.
arXiv Detail & Related papers (2021-12-14T18:03:28Z) - Learning to Generalize Compositionally by Transferring Across Semantic
Parsing Tasks [37.66114618645146]
We investigate learning representations that facilitate transfer learning from one compositional task to another.
We apply this method to semantic parsing, using three very different datasets.
Our method significantly improves compositional generalization over baselines on the test set of the target task.
arXiv Detail & Related papers (2021-11-09T09:10:21Z) - Improving Compositional Generalization with Self-Training for
Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks.
By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures.
We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.