A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection
- URL: http://arxiv.org/abs/2305.12580v2
- Date: Mon, 30 Oct 2023 05:51:34 GMT
- Title: A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection
- Authors: Marc E. Canby and Julia Hockenmaier
- Abstract summary: We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
- Score: 4.602447284133507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based encoder-decoder models that generate outputs in a
left-to-right fashion have become standard for sequence-to-sequence tasks. In
this paper, we propose a framework for decoding that produces sequences from
the "outside-in": at each step, the model chooses to generate a token on the
left, on the right, or join the left and right sequences. We argue that this is
more principled than prior bidirectional decoders. Our proposal supports a
variety of model architectures and includes several training methods, such as a
dynamic programming algorithm that marginalizes out the latent ordering
variable. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared
tasks, beating the next best systems by over 4.7 and 2.7 points in average
accuracy respectively. The model performs particularly well on long sequences,
can implicitly learn the split point of words composed of stem and affix, and
performs better relative to the baseline on datasets that have fewer unique
lemmas (but more examples per lemma).
Related papers
- Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Improving Tree-Structured Decoder Training for Code Generation via
Mutual Learning [27.080718377956693]
Code generation aims to automatically generate a piece of code given an input natural language utterance.
We first throughly analyze the context modeling difference between neural code generation models with different decodings.
We propose to introduce a mutual learning framework to jointly train these models.
arXiv Detail & Related papers (2021-05-31T08:44:13Z) - Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder.
Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z) - Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output.
This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z) - Consistent Multiple Sequence Decoding [36.46573114422263]
We introduce a consistent multiple sequence decoding architecture.
This architecture allows for consistent and simultaneous decoding of an arbitrary number of sequences.
We show the efficacy of our consistent multiple sequence decoder on the task of dense relational image captioning.
arXiv Detail & Related papers (2020-04-02T00:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.