Hierarchical Phrase-based Sequence-to-Sequence Learning
- URL: http://arxiv.org/abs/2211.07906v2
- Date: Wed, 16 Nov 2022 02:39:49 GMT
- Title: Hierarchical Phrase-based Sequence-to-Sequence Learning
- Authors: Bailin Wang, Ivan Titov, Jacob Andreas and Yoon Kim
- Abstract summary: We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
- Score: 94.10257313923478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe a neural transducer that maintains the flexibility of standard
sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases
as a source of inductive bias during training and as explicit constraints
during inference. Our approach trains two models: a discriminative parser based
on a bracketing transduction grammar whose derivation tree hierarchically
aligns source and target phrases, and a neural seq2seq model that learns to
translate the aligned phrases one-by-one. We use the same seq2seq model to
translate at all phrase scales, which results in two inference modes: one mode
in which the parser is discarded and only the seq2seq component is used at the
sequence-level, and another in which the parser is combined with the seq2seq
model. Decoding in the latter mode is done with the cube-pruned CKY algorithm,
which is more involved but can make use of new translation rules during
inference. We formalize our model as a source-conditioned synchronous grammar
and develop an efficient variational inference algorithm for training. When
applied on top of both randomly initialized and pretrained seq2seq models, we
find that both inference modes performs well compared to baselines on small
scale machine translation benchmarks.
Related papers
- Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition
and Constraints [30.219318352970948]
We study two low-rank variants of Neural QCFG for faster inference.
We introduce two soft constraints over tree hierarchy and source coverage.
We find that our models outperform vanilla Neural QCFG in most settings.
arXiv Detail & Related papers (2023-06-05T08:05:05Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - Translate First Reorder Later: Leveraging Monotonicity in Semantic
Parsing [4.396860522241306]
TPol is a two-step approach that translates input sentences monotonically and then reorders them to obtain the correct output.
We test our approach on two popular semantic parsing datasets.
arXiv Detail & Related papers (2022-10-10T17:50:42Z) - Conditional set generation using Seq2seq models [52.516563721766445]
Conditional set generation learns a mapping from an input sequence of tokens to a set.
Sequence-to-sequence(Seq2seq) models are a popular choice to model set generation.
We propose a novel algorithm for effectively sampling informative orders over the space of label orders.
arXiv Detail & Related papers (2022-05-25T04:17:50Z) - Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive
Bias to Sequence-to-sequence Models [23.21767225871304]
Sequence-to-sequence (seq2seq) models often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations.
We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not.
arXiv Detail & Related papers (2022-03-17T15:46:53Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised
Discrete Speech Representations [49.55361944105796]
We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-sequence framework.
A2O VC aims to convert any speaker, including those unseen during training, to a fixed target speaker.
arXiv Detail & Related papers (2020-10-23T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.