Show Me How To Revise: Improving Lexically Constrained Sentence
Generation with XLNet
- URL: http://arxiv.org/abs/2109.05797v1
- Date: Mon, 13 Sep 2021 09:21:07 GMT
- Title: Show Me How To Revise: Improving Lexically Constrained Sentence
Generation with XLNet
- Authors: Xingwei He, Victor O.K. Li
- Abstract summary: We propose a two-step approach, "Predict and Revise", for constrained sentence generation.
During the predict step, we leveraged the classifier to compute the learned prior for the candidate sentence.
During the revise step, we resorted to MCMC sampling to revise the candidate sentence by conducting a sampled action at a sampled position drawn from the learned prior.
Experimental results have demonstrated that our proposed model performs much better than the previous work in terms of sentence fluency and diversity.
- Score: 27.567493727582736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lexically constrained sentence generation allows the incorporation of prior
knowledge such as lexical constraints into the output. This technique has been
applied to machine translation, and dialog response generation. Previous work
usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically
constrained sentences, but they randomly determined the position to be edited
and the action to be taken, resulting in many invalid refinements. To overcome
this challenge, we used a classifier to instruct the MCMC-based models where
and how to refine the candidate sentences. First, we developed two methods to
create synthetic data on which the pre-trained model is fine-tuned to obtain a
reliable classifier. Next, we proposed a two-step approach, "Predict and
Revise", for constrained sentence generation. During the predict step, we
leveraged the classifier to compute the learned prior for the candidate
sentence. During the revise step, we resorted to MCMC sampling to revise the
candidate sentence by conducting a sampled action at a sampled position drawn
from the learned prior. We compared our proposed models with many strong
baselines on two tasks, generating sentences with lexical constraints and text
infilling. Experimental results have demonstrated that our proposed model
performs much better than the previous work in terms of sentence fluency and
diversity. Our code and pre-trained models are available at
https://github.com/NLPCode/MCMCXLNet.
Related papers
- Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens [31.568675300434816]
Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset.
During inference time, they are utilized differently, generating text sequentially and auto-regressively by using previously generated tokens as input to predict the next one.
This paper proposes two simple approaches based on model own generation to address this discrepancy between the training and inference time.
arXiv Detail & Related papers (2024-10-18T17:48:27Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Self-Consistent Decoding for More Factual Open Responses [28.184313177333642]
"Sample & Select" improves factuality by a 30% relative margin against decoders of DoLA, P-CRR, and S-CRR.
We collect human verifications of the generated summaries, confirming the factual superiority of our method.
arXiv Detail & Related papers (2024-03-01T17:31:09Z) - Efficient and Flexible Topic Modeling using Pretrained Embeddings and
Bag of Sentences [1.8592384822257952]
We propose a novel topic modeling and inference algorithm.
We leverage pre-trained sentence embeddings by combining generative process models and clustering.
TheTailor evaluation shows that our method yields state-of-the art results with relatively little computational demands.
arXiv Detail & Related papers (2023-02-06T20:13:11Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - $k$-Neighbor Based Curriculum Sampling for Sequence Prediction [22.631763991832862]
Multi-step ahead prediction in language models is challenging due to discrepancy between training and test time processes.
We propose textitNearest-Neighbor Replacement Sampling -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy.
We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.
arXiv Detail & Related papers (2021-01-22T20:07:29Z) - Unsupervised Extractive Summarization by Pre-training Hierarchical
Transformers [107.12125265675483]
Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training.
Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities.
We find that transformer attentions can be used to rank sentences for unsupervised extractive summarization.
arXiv Detail & Related papers (2020-10-16T08:44:09Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.