Grounded Graph Decoding Improves Compositional Generalization in
Question Answering
- URL: http://arxiv.org/abs/2111.03642v1
- Date: Fri, 5 Nov 2021 17:50:14 GMT
- Title: Grounded Graph Decoding Improves Compositional Generalization in
Question Answering
- Authors: Yu Gai, Paras Jain, Wendi Zhang, Joseph E. Gonzalez, Dawn Song, Ion
Stoica
- Abstract summary: Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
- Score: 68.72605660152101
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Question answering models struggle to generalize to novel compositions of
training patterns, such to longer sequences or more complex test structures.
Current end-to-end models learn a flat input embedding which can lose input
syntax context. Prior approaches improve generalization by learning permutation
invariant models, but these methods do not scale to more complex train-test
splits. We propose Grounded Graph Decoding, a method to improve compositional
generalization of language representations by grounding structured predictions
with an attention mechanism. Grounding enables the model to retain syntax
information from the input in thereby significantly improving generalization
over complex inputs. By predicting a structured graph containing conjunctions
of query clauses, we learn a group invariant representation without making
assumptions on the target domain. Our model significantly outperforms
state-of-the-art baselines on the Compositional Freebase Questions (CFQ)
dataset, a challenging benchmark for compositional generalization in question
answering. Moreover, we effectively solve the MCD1 split with 98% accuracy.
Related papers
- Towards Understanding the Relationship between In-context Learning and Compositional Generalization [7.843029855730508]
We train a causal Transformer in a setting that renders ordinary learning very difficult.
The model can solve the task, however, by utilizing earlier examples to generalize to later ones.
In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization.
arXiv Detail & Related papers (2024-03-18T14:45:52Z) - Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Unlocking Compositional Generalization in Pre-trained Models Using
Intermediate Representations [27.244943870086175]
Sequence-to-sequence (seq2seq) models have been found to struggle at out-of-distribution compositional generalization.
We study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models.
arXiv Detail & Related papers (2021-04-15T14:15:14Z) - Improving Compositional Generalization in Semantic Parsing [54.4720965813889]
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently.
We investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization.
arXiv Detail & Related papers (2020-10-12T12:34:58Z) - Latent Compositional Representations Improve Systematic Generalization
in Grounded Question Answering [46.87501300706542]
State-of-the-art models in grounded question answering often do not explicitly perform decomposition.
We propose a model that computes a representation and denotation for all question spans in a bottom-up, compositional manner.
Our model induces latent trees, driven by end-to-end (the answer) only.
arXiv Detail & Related papers (2020-07-01T06:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.