System-Level Natural Language Feedback
- URL: http://arxiv.org/abs/2306.13588v3
- Date: Sat, 3 Feb 2024 00:24:11 GMT
- Title: System-Level Natural Language Feedback
- Authors: Weizhe Yuan, Kyunghyun Cho, Jason Weston
- Abstract summary: We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process.
We conduct two case studies of this approach for improving search query and dialog response generation.
We show the combination of system-level and instance-level feedback brings further gains.
- Score: 83.24259100437965
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Natural language (NL) feedback offers rich insights into user experience.
While existing studies focus on an instance-level approach, where feedback is
used to refine specific examples, we introduce a framework for system-level use
of NL feedback. We show how to use feedback to formalize system-level design
decisions in a human-in-the-loop-process -- in order to produce better models.
In particular this is done through: (i) metric design for tasks; and (ii)
language model prompt design for refining model responses. We conduct two case
studies of this approach for improving search query and dialog response
generation, demonstrating the effectiveness of system-level feedback. We show
the combination of system-level and instance-level feedback brings further
gains, and that human written instance-level feedback results in more grounded
refinements than GPT-3.5 written ones, underlying the importance of human
feedback for building systems. We release our code and data at
https://github.com/yyy-Apple/Sys-NL-Feedback.
Related papers
- Learning from Naturally Occurring Feedback [25.266461597402056]
We propose a scalable method for extracting feedback that users naturally include when interacting with chat models.
We manually annotated conversation data to confirm the presence of naturally occurring feedback.
We apply our method to over 1M conversations to obtain hundreds of thousands of feedback samples.
arXiv Detail & Related papers (2024-07-15T17:41:34Z) - RLVF: Learning from Verbal Feedback without Overgeneralization [94.19501420241188]
We study the problem of incorporating verbal feedback without such overgeneralization.
We develop a new method Contextualized Critiques with Constrained Preference Optimization (C3PO)
Our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts.
arXiv Detail & Related papers (2024-02-16T18:50:24Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.