Dialogue Response Ranking Training with Large-Scale Human Feedback Data
- URL: http://arxiv.org/abs/2009.06978v1
- Date: Tue, 15 Sep 2020 10:50:05 GMT
- Title: Dialogue Response Ranking Training with Large-Scale Human Feedback Data
- Authors: Xiang Gao, Yizhe Zhang, Michel Galley, Chris Brockett, Bill Dolan
- Abstract summary: We leverage social media feedback data to build a large-scale training dataset for feedback prediction.
We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data.
Our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback.
- Score: 52.12342165926226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing open-domain dialog models are generally trained to minimize the
perplexity of target human responses. However, some human replies are more
engaging than others, spawning more followup interactions. Current
conversational models are increasingly capable of producing turns that are
context-relevant, but in order to produce compelling agents, these models need
to be able to predict and optimize for turns that are genuinely engaging. We
leverage social media feedback data (number of replies and upvotes) to build a
large-scale training dataset for feedback prediction. To alleviate possible
distortion between the feedback and engagingness, we convert the ranking
problem to a comparison of response pairs which involve few confounding
factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of
human feedback data and the resulting ranker outperformed several baselines.
Particularly, our ranker outperforms the conventional dialog perplexity
baseline with a large margin on predicting Reddit feedback. We finally combine
the feedback prediction models and a human-like scoring model to rank the
machine-generated dialog responses. Crowd-sourced human evaluation shows that
our ranking method correlates better with real human preferences than baseline
models.
Related papers
- Learning from Naturally Occurring Feedback [25.266461597402056]
We propose a scalable method for extracting feedback that users naturally include when interacting with chat models.
We manually annotated conversation data to confirm the presence of naturally occurring feedback.
We apply our method to over 1M conversations to obtain hundreds of thousands of feedback samples.
arXiv Detail & Related papers (2024-07-15T17:41:34Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Leveraging Implicit Feedback from Deployment Data in Dialogue [83.02878726357523]
We study improving social conversational agents by learning from natural dialogue between users and a deployed model.
We leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes.
arXiv Detail & Related papers (2023-07-26T11:34:53Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Improving Open-Domain Dialogue Evaluation with a Causal Inference Model [8.625569782672663]
Explicit satisfaction ratings can be elicited from users, but users often do not provide ratings when asked.
Post-hoc ratings by experts are an alternative, but these can be both expensive and complex to collect.
Here, we explore the creation of automated methods for predicting both expert and user ratings of open-domain dialogues.
arXiv Detail & Related papers (2023-01-31T02:31:42Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Multi-Referenced Training for Dialogue Response Generation [36.24321477524634]
We show that gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.
We generate diverse pseudo references from a powerful pretrained model to build multi-referenced data that provides a better approximation of the real-world distribution.
arXiv Detail & Related papers (2020-09-15T14:17:53Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.