Training Language Models with Natural Language Feedback
- URL: http://arxiv.org/abs/2204.14146v2
- Date: Mon, 2 May 2022 17:31:39 GMT
- Title: Training Language Models with Natural Language Feedback
- Authors: J\'er\'emy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen,
Kyunghyun Cho, Ethan Perez
- Abstract summary: We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
- Score: 51.36137482891037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models often do not perform tasks in ways that are in
line with our preferences, e.g., generating offensive text or factually
incorrect summaries. Recent work approaches the above issue by learning from a
simple form of human evaluation: comparisons between pairs of model-generated
task outputs. Comparison feedback conveys limited information about human
preferences per human evaluation. Here, we propose to learn from natural
language feedback, which conveys more information per human evaluation. We
learn from language feedback on model outputs using a three-step learning
algorithm. First, we condition the language model on the initial output and
feedback to generate many refinements. Second, we choose the refinement with
the highest similarity to the feedback. Third, we finetune a language model to
maximize the likelihood of the chosen refinement given the input. In synthetic
experiments, we first evaluate whether language models accurately incorporate
feedback to produce refinements, finding that only large language models (175B
parameters) do so. Using only 100 samples of human-written feedback, our
learning algorithm finetunes a GPT-3 model to roughly human-level
summarization.
Related papers
- Learning from Naturally Occurring Feedback [25.266461597402056]
We propose a scalable method for extracting feedback that users naturally include when interacting with chat models.
We manually annotated conversation data to confirm the presence of naturally occurring feedback.
We apply our method to over 1M conversations to obtain hundreds of thousands of feedback samples.
arXiv Detail & Related papers (2024-07-15T17:41:34Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback.
ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements.
We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - Training language models to follow instructions with human feedback [29.590666996229206]
We show an avenue for aligning language models with user intent by fine-tuning with human feedback.
InstructGPT models show improvements in truthfulness and reductions in toxic output generation.
arXiv Detail & Related papers (2022-03-04T07:04:42Z) - Estimating Subjective Crowd-Evaluations as an Additional Objective to
Improve Natural Language Generation [0.0]
We use a crowd-authored dialogue corpus to fine-tune six different language generation models.
Two of these models incorporate multi-task learning and use subjective ratings of lines as part of an explicit learning goal.
A human evaluation of the generated dialogue lines reveals that utterances generated by the multi-tasking models were subjectively rated as the most typical, most moving the conversation forward, and least offensive.
arXiv Detail & Related papers (2021-04-12T06:33:16Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.