Generating Sequences by Learning to Self-Correct
- URL: http://arxiv.org/abs/2211.00053v1
- Date: Mon, 31 Oct 2022 18:09:51 GMT
- Title: Generating Sequences by Learning to Self-Correct
- Authors: Sean Welleck, Ximing Lu, Peter West, Faeze Brahman, Tianxiao Shen,
Daniel Khashabi, Yejin Choi
- Abstract summary: Self-Correction decouples an imperfect base generator from a separate corrector that learns to iteratively correct imperfect generations.
We show that Self-Correction improves upon the base generator in three diverse generation tasks.
- Score: 64.0249217590888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequence generation applications require satisfying semantic constraints,
such as ensuring that programs are correct, using certain keywords, or avoiding
undesirable content. Language models, whether fine-tuned or prompted with
few-shot demonstrations, frequently violate these constraints, and lack a
mechanism to iteratively revise their outputs. Moreover, some powerful language
models are of extreme scale or inaccessible, making it inefficient, if not
infeasible, to update their parameters for task-specific adaptation. We present
Self-Correction, an approach that decouples an imperfect base generator (an
off-the-shelf language model or supervised sequence-to-sequence model) from a
separate corrector that learns to iteratively correct imperfect generations. To
train the corrector, we propose an online training procedure that can use
either scalar or natural language feedback on intermediate imperfect
generations. We show that Self-Correction improves upon the base generator in
three diverse generation tasks - mathematical program synthesis,
lexically-constrained generation, and toxicity control - even when the
corrector is much smaller than the base generator.
Related papers
- SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Calibrating Sequence likelihood Improves Conditional Language Generation [39.35161650538767]
Conditional language models are predominantly trained with maximum likelihood estimation (MLE)
While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality.
We introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space.
arXiv Detail & Related papers (2022-09-30T19:16:16Z) - DIRECTOR: Generator-Classifiers For Supervised Language Modeling [27.86870968048833]
Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions.
We introduce a new architecture, sc Director, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token.
arXiv Detail & Related papers (2022-06-15T17:44:08Z) - Bridging the Gap Between Training and Inference of Bayesian Controllable
Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
BCLMs have been shown to be efficient in controllable language generation.
We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - Sequence-to-Action: Grammatical Error Correction with Action Guided
Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction.
The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence.
Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z) - Discovering Non-monotonic Autoregressive Orderings with Variational
Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data.
We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass.
Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z) - Contrastive Learning with Adversarial Perturbations for Conditional Text
Generation [49.055659008469284]
We propose a principled method to generate positive and negative samples for contrastive learning of seq2seq models.
Specifically, we generate negative examples by adding small perturbations to the input sequence to minimize its conditional likelihood.
We empirically show that our proposed method significantly improves the generalization of the seq2seq on three text generation tasks.
arXiv Detail & Related papers (2020-12-14T06:20:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.