Structured Reordering for Modeling Latent Alignments in Sequence
Transduction
- URL: http://arxiv.org/abs/2106.03257v2
- Date: Tue, 8 Jun 2021 12:57:19 GMT
- Title: Structured Reordering for Modeling Latent Alignments in Sequence
Transduction
- Authors: Bailin Wang, Mirella Lapata and Ivan Titov
- Abstract summary: We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
- Score: 86.94309120789396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite success in many domains, neural models struggle in settings where
train and test examples are drawn from different distributions. In particular,
in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail
to generalize systematically, i.e., interpret sentences representing novel
combinations of concepts (e.g., text segments) seen in training. Traditional
grammar formalisms excel in such settings by implicitly encoding alignments
between input and output segments, but are hard to scale and maintain. Instead
of engineering a grammar, we directly model segment-to-segment alignments as
discrete structured latent variables within a neural seq2seq model. To
efficiently explore the large space of alignments, we introduce a reorder-first
align-later framework whose central component is a neural reordering module
producing {\it separable} permutations. We present an efficient dynamic
programming algorithm performing exact marginal inference of separable
permutations, and, thus, enabling end-to-end differentiable training of our
model. The resulting seq2seq model exhibits better systematic generalization
than standard models on synthetic problems and NLP tasks (i.e., semantic
parsing and machine translation).
Related papers
- Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations.
We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Sequence-to-Sequence Learning with Latent Neural Grammars [12.624691611049341]
Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks.
While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization.
This work explores an alternative, hierarchical approach to sequence-to-sequence learning with quasi-synchronous grammars.
arXiv Detail & Related papers (2021-09-02T17:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.