Parallel Refinements for Lexically Constrained Text Generation with BART
- URL: http://arxiv.org/abs/2109.12487v1
- Date: Sun, 26 Sep 2021 03:56:45 GMT
- Title: Parallel Refinements for Lexically Constrained Text Generation with BART
- Authors: Xingwei He
- Abstract summary: We propose Constrained BART (CBART) for lexically constrained text generation.
CBART transfers part of the generation burden from the decoder to the encoder by decomposing this task into two sub-tasks, thereby improving the sentence quality.
Experiment results on One-Billion-Word and Yelp show that CBART can generate plausible text with high quality and diversity while significantly accelerating inference.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lexically constrained text generation aims to control the generated text by
incorporating some pre-specified keywords into the output. Previous work
injects lexical constraints into the output by controlling the decoding process
or refining the candidate output iteratively, which tends to generate generic
or ungrammatical sentences, and has high computational complexity. To address
these challenges, we propose Constrained BART (CBART) for lexically constrained
text generation. CBART leverages the pre-trained model BART and transfers part
of the generation burden from the decoder to the encoder by decomposing this
task into two sub-tasks, thereby improving the sentence quality. Concretely, we
extend BART by adding a token-level classifier over the encoder, aiming at
instructing the decoder where to replace and insert. Guided by the encoder, the
decoder refines multiple tokens of the input in one step by inserting tokens
before specific positions and re-predicting tokens with low confidence. To
further reduce the inference latency, the decoder predicts all tokens in
parallel. Experiment results on One-Billion-Word and Yelp show that CBART can
generate plausible text with high quality and diversity while significantly
accelerating inference.
Related papers
- Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval [26.00149743478937]
Masked auto-encoder pre-training has emerged as a prevalent technique for initializing and enhancing dense retrieval systems.
We propose a modification to the traditional MAE by replacing the decoder of a masked auto-encoder with a completely simplified Bag-of-Word prediction task.
Our proposed method achieves state-of-the-art retrieval performance on several large-scale retrieval benchmarks without requiring any additional parameters.
arXiv Detail & Related papers (2024-01-20T15:02:33Z) - Look-back Decoding for Open-Ended Text Generation [62.53302138266465]
We propose Look-back, an improved decoding algorithm that tracks the distribution distance between current and historical decoding steps.
Look-back can automatically predict potential repetitive phrase and topic drift, and remove tokens that may cause the failure modes.
We perform decoding experiments on document continuation and story generation, and demonstrate that Look-back is able to generate more fluent and coherent text.
arXiv Detail & Related papers (2023-05-22T20:42:37Z) - Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers.
Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z) - Towards More Efficient Insertion Transformer with Fractional Positional
Encoding [44.45401243989363]
Auto-regressive neural sequence models have been shown to be effective across text generation tasks.
Their left-to-right decoding order prevents generation from being parallelized.
Insertion Transformer is an attractive alternative that allows outputting multiple tokens in a single generation step.
arXiv Detail & Related papers (2021-12-12T18:38:27Z) - Scheduled Sampling in Vision-Language Pretraining with Decoupled
Encoder-Decoder Network [99.03895740754402]
We propose a two-stream decoupled design of encoder-decoder structure, in which two decoupled cross-modal encoder and decoder are involved.
As an alternative, we propose a primary scheduled sampling strategy that mitigates such discrepancy via pretraining encoder-decoder in a two-pass manner.
arXiv Detail & Related papers (2021-01-27T17:36:57Z) - Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder.
Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.