Quark: Controllable Text Generation with Reinforced Unlearning
- URL: http://arxiv.org/abs/2205.13636v1
- Date: Thu, 26 May 2022 21:11:51 GMT
- Title: Quark: Controllable Text Generation with Reinforced Unlearning
- Authors: Ximing Lu, Sean Welleck, Liwei Jiang, Jack Hessel, Lianhui Qin, Peter
West, Prithviraj Ammanabrolu, Yejin Choi
- Abstract summary: Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
- Score: 68.07749519374089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale language models often learn behaviors that are misaligned with
user expectations. Generated text may contain offensive or toxic language,
contain significant repetition, or be of a different sentiment than desired by
the user. We consider the task of unlearning these misalignments by fine-tuning
the language model on signals of what not to do. We introduce Quantized Reward
Konditioning (Quark), an algorithm for optimizing a reward function that
quantifies an (un)wanted property, while not straying too far from the original
model. Quark alternates between (i) collecting samples with the current
language model, (ii) sorting them into quantiles based on reward, with each
quantile identified by a reward token prepended to the language model's input,
and (iii) using a standard language modeling loss on samples from each quantile
conditioned on its reward token, while remaining nearby the original language
model via a KL-divergence penalty. By conditioning on a high-reward token at
generation time, the model generates text that exhibits less of the unwanted
property. For unlearning toxicity, negative sentiment, and repetition, our
experiments show that Quark outperforms both strong baselines and
state-of-the-art reinforcement learning methods like PPO (Schulman et al.
2017), while relying only on standard language modeling primitives.
Related papers
- MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models [40.992566245706996]
We propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens.
We train generative language models at different scales of 468M, 1.2B, and 6.7B parameters.
Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.
arXiv Detail & Related papers (2023-10-30T13:33:21Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - A Natural Bias for Language Generation Models [31.44752136404971]
We show that we can endow standard neural language generation models with a separate module that reflects unigram frequency statistics as prior knowledge.
We use neural machine translation as a test bed for this simple technique and observe that it: (i) improves learning efficiency; (ii) achieves better overall performance; and perhaps most importantly: appears to disentangle strong frequency effects.
arXiv Detail & Related papers (2022-12-19T18:14:36Z) - DIRECTOR: Generator-Classifiers For Supervised Language Modeling [27.86870968048833]
Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions.
We introduce a new architecture, sc Director, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token.
arXiv Detail & Related papers (2022-06-15T17:44:08Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Are Some Words Worth More than Others? [3.5598388686985354]
We propose two new intrinsic evaluation measures within the framework of a simple word prediction task.
We evaluate several commonly-used large English language models using our proposed metrics.
arXiv Detail & Related papers (2020-10-12T23:12:11Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.