Disentangled Sequence to Sequence Learning for Compositional
Generalization
- URL: http://arxiv.org/abs/2110.04655v1
- Date: Sat, 9 Oct 2021 22:27:19 GMT
- Title: Disentangled Sequence to Sequence Learning for Compositional
Generalization
- Authors: Hao Zheng and Mirella Lapata
- Abstract summary: We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
- Score: 62.954842223732435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is mounting evidence that existing neural network models, in particular
the very popular sequence-to-sequence architecture, struggle with compositional
generalization, i.e., the ability to systematically generalize to unseen
compositions of seen components. In this paper we demonstrate that one of the
reasons hindering compositional generalization relates to the representations
being entangled. We propose an extension to sequence-to-sequence models which
allows us to learn disentangled representations by adaptively re-encoding (at
each time step) the source input. Specifically, we condition the source
representations on the newly decoded target context which makes it easier for
the encoder to exploit specialized information for each prediction rather than
capturing all source information in a single forward pass. Experimental results
on semantic parsing and machine translation empirically show that our proposal
yields more disentangled representations and better generalization.
Related papers
- Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Compositional Generalization in Unsupervised Compositional
Representation Learning: A Study on Disentanglement and Emergent Language [48.37815764394315]
We study three unsupervised representation learning algorithms on two datasets that allow directly testing compositional generalization.
We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself.
Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization.
arXiv Detail & Related papers (2022-10-02T10:35:53Z) - Improving Compositional Generalization in Classification Tasks via
Structure Annotations [33.90268697120572]
Humans have a great ability to generalize compositionally, but state-of-the-art neural models struggle to do so.
First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization.
Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
arXiv Detail & Related papers (2021-06-19T06:07:27Z) - Compositional Generalization via Semantic Tagging [81.24269148865555]
We propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models.
We show that the proposed approach consistently improves compositional generalization across model architectures, domains, and semantic formalisms.
arXiv Detail & Related papers (2020-10-22T15:55:15Z) - Hierarchical Poset Decoding for Compositional Generalization in Language [52.13611501363484]
We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset)
Current encoder-decoder architectures do not take the poset structure of semantics into account properly.
We propose a novel hierarchical poset decoding paradigm for compositional generalization in language.
arXiv Detail & Related papers (2020-10-15T14:34:26Z) - Improving Compositional Generalization in Semantic Parsing [54.4720965813889]
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently.
We investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization.
arXiv Detail & Related papers (2020-10-12T12:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.