Related papers: On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators

On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators

URL: http://arxiv.org/abs/2102.11008v2
Date: Mon, 1 Mar 2021 06:12:48 GMT
Title: On Efficient Training, Controllability and Compositional Generalization of Insertion-based Language Generators
Authors: Sidi Lu and Nanyun Peng
Abstract summary: InsNet is an insertion-based sequence model that can be trained as efficiently as transformer decoders. We evaluate InsNet on story generation and CleVR-CoGENT captioning.
Score: 18.98725770517241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Auto-regressive language models with the left-to-right generation order have been a predominant paradigm for language generation. Recently, out-of-order text generation beyond the traditional left-to-right paradigm has attracted extensive attention, with a notable variation of insertion-based generation, where a model is used to gradually extend the context into a complete sentence purely with insertion operations. However, since insertion operations disturb the position information of each token, it is often believed that each step of the insertion-based likelihood estimation requires a bi-directional \textit{re-encoding} of the whole generated sequence. This computational overhead prohibits the model from scaling up to generate long, diverse texts such as stories, news articles, and reports. To address this issue, we propose InsNet, an insertion-based sequence model that can be trained as efficiently as traditional transformer decoders while maintaining the same performance as that with a bi-directional context encoder. We evaluate InsNet on story generation and CleVR-CoGENT captioning, showing the advantages of InsNet in several dimensions, including computational costs, generation quality, the ability to perfectly incorporate lexical controls, and better compositional generalization.

Related papers

Self-Infilling Code Generation [60.12883980846781]
We introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding. Our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.
arXiv Detail & Related papers (2023-11-29T16:02:06Z)
Efficient Guided Generation for Large Language Models [0.21485350418225244]
We show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars.
arXiv Detail & Related papers (2023-07-19T01:14:49Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z)
Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers. Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z)
Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z)
DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer [40.10695204278747]
We propose DiscoDVT, a discourse-aware discrete variational Transformer to tackle the incoherence issue. We conduct extensive experiments on two open story generation datasets and demonstrate that the latent codes learn meaningful correspondence to the discourse structures that guide the model to generate long texts with better long-range coherence.
arXiv Detail & Related papers (2021-10-12T13:41:06Z)
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.