Leveraging Implicit Feedback from Deployment Data in Dialogue
- URL: http://arxiv.org/abs/2307.14117v2
- Date: Thu, 1 Feb 2024 04:30:38 GMT
- Title: Leveraging Implicit Feedback from Deployment Data in Dialogue
- Authors: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason
Weston
- Abstract summary: We study improving social conversational agents by learning from natural dialogue between users and a deployed model.
We leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes.
- Score: 83.02878726357523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study improving social conversational agents by learning from natural
dialogue between users and a deployed model, without extra annotations. To
implicitly measure the quality of a machine-generated utterance, we leverage
signals like user response length, sentiment and reaction of the future human
utterances in the collected dialogue episodes. Our experiments use the publicly
released deployment data from BlenderBot (Xu et al., 2023). Human evaluation
indicates improvements in our new models over baseline responses; however, we
find that some proxy signals can lead to more generations with undesirable
properties as well. For example, optimizing for conversation length can lead to
more controversial or unfriendly generations compared to the baseline, whereas
optimizing for positive sentiment or reaction can decrease these behaviors.
Related papers
- Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad
Responses into Good Labels [34.6235464256814]
Juicer is a framework to make use of both binary and free-form textual human feedback.
We find that augmenting training with model-corrected replies improves the final dialogue model.
arXiv Detail & Related papers (2022-10-28T04:57:21Z) - What is wrong with you?: Leveraging User Sentiment for Automatic Dialog
Evaluation [73.03318027164605]
We propose to use information that can be automatically extracted from the next user utterance as a proxy to measure the quality of the previous system response.
Our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.
arXiv Detail & Related papers (2022-03-25T22:09:52Z) - Dialogue Response Ranking Training with Large-Scale Human Feedback Data [52.12342165926226]
We leverage social media feedback data to build a large-scale training dataset for feedback prediction.
We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data.
Our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback.
arXiv Detail & Related papers (2020-09-15T10:50:05Z) - Speaker Sensitive Response Evaluation Model [17.381658875470638]
We propose an automatic evaluation model based on the similarity of the generated response with the conversational context.
We learn the model parameters from an unlabeled conversation corpus.
We show that our model can be applied to movie dialogues without any additional training.
arXiv Detail & Related papers (2020-06-12T08:59:10Z) - Neural Generation of Dialogue Response Timings [13.611050992168506]
We propose neural models that simulate the distributions of spoken response offsets.
The models are designed to be integrated into the pipeline of an incremental spoken dialogue system.
We show that human listeners consider certain response timings to be more natural based on the dialogue context.
arXiv Detail & Related papers (2020-05-18T23:00:57Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.