On Efficient Training, Controllability and Compositional Generalization
of Insertion-based Language Generators
- URL: http://arxiv.org/abs/2102.11008v2
- Date: Mon, 1 Mar 2021 06:12:48 GMT
- Title: On Efficient Training, Controllability and Compositional Generalization
of Insertion-based Language Generators
- Authors: Sidi Lu and Nanyun Peng
- Abstract summary: InsNet is an insertion-based sequence model that can be trained as efficiently as transformer decoders.
We evaluate InsNet on story generation and CleVR-CoGENT captioning.
- Score: 18.98725770517241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Auto-regressive language models with the left-to-right generation order have
been a predominant paradigm for language generation. Recently, out-of-order
text generation beyond the traditional left-to-right paradigm has attracted
extensive attention, with a notable variation of insertion-based generation,
where a model is used to gradually extend the context into a complete sentence
purely with insertion operations. However, since insertion operations disturb
the position information of each token, it is often believed that each step of
the insertion-based likelihood estimation requires a bi-directional
\textit{re-encoding} of the whole generated sequence. This computational
overhead prohibits the model from scaling up to generate long, diverse texts
such as stories, news articles, and reports. To address this issue, we propose
InsNet, an insertion-based sequence model that can be trained as efficiently as
traditional transformer decoders while maintaining the same performance as that
with a bi-directional context encoder. We evaluate InsNet on story generation
and CleVR-CoGENT captioning, showing the advantages of InsNet in several
dimensions, including computational costs, generation quality, the ability to
perfectly incorporate lexical controls, and better compositional
generalization.
Related papers
- Self-Infilling Code Generation [60.12883980846781]
We introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.
We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding.
Our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.
arXiv Detail & Related papers (2023-11-29T16:02:06Z) - Efficient Guided Generation for Large Language Models [0.21485350418225244]
We show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine.
This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars.
arXiv Detail & Related papers (2023-07-19T01:14:49Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers.
Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational
Transformer [40.10695204278747]
We propose DiscoDVT, a discourse-aware discrete variational Transformer to tackle the incoherence issue.
We conduct extensive experiments on two open story generation datasets and demonstrate that the latent codes learn meaningful correspondence to the discourse structures that guide the model to generate long texts with better long-range coherence.
arXiv Detail & Related papers (2021-10-12T13:41:06Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.