Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning
- URL: http://arxiv.org/abs/2210.07792v1
- Date: Fri, 14 Oct 2022 13:21:33 GMT
- Title: Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning
- Authors: Louis Castricato, Alexander Havrilla, Shahbuland Matiana, Michael
Pieler, Anbang Ye, Ian Yang, Spencer Frazier and Mark Riedl
- Abstract summary: Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
- Score: 53.92465205531759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Controlled automated story generation seeks to generate natural language
stories satisfying constraints from natural language critiques or preferences.
Existing methods to control for story preference utilize prompt engineering
which is labor intensive and often inconsistent. They may also use
logit-manipulation methods which require annotated datasets to exist for the
desired attributes. To address these issues, we first train a contrastive
bi-encoder model to align stories with corresponding human critiques, named
CARP, building a general purpose preference model. This is subsequently used as
a reward function to fine-tune a generative language model via reinforcement
learning. However, simply fine-tuning a generative language model with a
contrastive reward model does not always reliably result in a story generation
system capable of generating stories that meet user preferences. To increase
story generation robustness we further fine-tune the contrastive reward model
using a prompt-learning technique. A human participant study is then conducted
comparing generations from our full system, ablations, and two baselines. We
show that the full fine-tuning pipeline results in a story generator preferred
over a LLM 20x as large as well as logit-based methods. This motivates the use
of contrastive learning for general purpose human preference modeling.
Related papers
- Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Generate to Understand for Representation [3.5325087487696463]
GUR is a pretraining framework that combines language modeling and contrastive learning objectives in a single training step.
GUR achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting.
arXiv Detail & Related papers (2023-06-14T06:00:18Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z) - Goal-Directed Story Generation: Augmenting Generative Language Models
with Reinforcement Learning [7.514717103747824]
We present two automated techniques grounded in deep reinforcement learning and reward shaping to control the plot of computer-generated stories.
The first utilizes proximal policy optimization to fine-tune an existing transformer-based language model to generate text continuations but also be goal-seeking.
The second extracts a knowledge graph from the unfolding story, which is used by a policy network with graph attention to select a candidate continuation generated by a language model.
arXiv Detail & Related papers (2021-12-16T03:34:14Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Improving Language Generation with Sentence Coherence Objective [4.997730662279843]
Existing models are often prone to output paragraphs of texts that gradually diverge from the given prompt.
The goal of our project is to improve the coherence and consistency across sentences in a language-generation model.
arXiv Detail & Related papers (2020-09-07T06:10:03Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.