Learning Improvised Chatbots from Adversarial Modifications of Natural
Language Feedback
- URL: http://arxiv.org/abs/2010.07261v2
- Date: Thu, 15 Oct 2020 02:19:13 GMT
- Title: Learning Improvised Chatbots from Adversarial Modifications of Natural
Language Feedback
- Authors: Makesh Narsimhan Sreedhar, Kun Ni, Siva Reddy
- Abstract summary: We propose a generative adversarial model that converts noisy feedback into a plausible natural response in a conversation.
The generator's goal is to convert the feedback into a response that answers the user's previous utterance and to fool the discriminator.
- Score: 19.026954124876582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ubiquitous nature of chatbots and their interaction with users generate
an enormous amount of data. Can we improve chatbots using this data? A
self-feeding chatbot improves itself by asking natural language feedback when a
user is dissatisfied with its response and uses this feedback as an additional
training sample. However, user feedback in most cases contains extraneous
sequences hindering their usefulness as a training sample. In this work, we
propose a generative adversarial model that converts noisy feedback into a
plausible natural response in a conversation. The generator's goal is to
convert the feedback into a response that answers the user's previous utterance
and to fool the discriminator which distinguishes feedback from natural
responses. We show that augmenting original training data with these modified
feedback responses improves the original chatbot performance from 69.94% to
75.96% in ranking correct responses on the Personachat dataset, a large
improvement given that the original model is already trained on 131k samples.
Related papers
- Learning from Naturally Occurring Feedback [25.266461597402056]
We propose a scalable method for extracting feedback that users naturally include when interacting with chat models.
We manually annotated conversation data to confirm the presence of naturally occurring feedback.
We apply our method to over 1M conversations to obtain hundreds of thousands of feedback samples.
arXiv Detail & Related papers (2024-07-15T17:41:34Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - Leveraging Implicit Feedback from Deployment Data in Dialogue [83.02878726357523]
We study improving social conversational agents by learning from natural dialogue between users and a deployed model.
We leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes.
arXiv Detail & Related papers (2023-07-26T11:34:53Z) - Rewarding Chatbots for Real-World Engagement with Millions of Users [1.2583983802175422]
This work investigates the development of social chatbots that prioritize user engagement to enhance retention.
The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses.
A/B testing on groups of 10,000 new dailychat users on the Chai Research platform shows that this approach increases the MCL by up to 70%.
Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
arXiv Detail & Related papers (2023-03-10T18:53:52Z) - When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad
Responses into Good Labels [34.6235464256814]
Juicer is a framework to make use of both binary and free-form textual human feedback.
We find that augmenting training with model-corrected replies improves the final dialogue model.
arXiv Detail & Related papers (2022-10-28T04:57:21Z) - MCP: Self-supervised Pre-training for Personalized Chatbots with
Multi-level Contrastive Sampling [18.40883902610959]
We propose a self-supervised learning framework for capturing better representations from users' dialogue history for personalized chatbots.
Specifically, we apply contrastive sampling methods to leverage the supervised signals hidden in user dialog history.
Experimental results on two real-world datasets show a significant improvement in our proposed model MCP compared with the existing methods.
arXiv Detail & Related papers (2022-10-17T05:16:23Z) - Jewelry Shop Conversational Chatbot [0.0]
We build a conversational agent for a jewelry shop that finds the underlying objective of the customer's query by finding similarity of the input to patterns in the corpus.
Our system features an audio input interface for clients, so they may speak to it in natural language.
To gauge the system's performance, we used performance metrics such as Recall, Precision and F1 score.
arXiv Detail & Related papers (2022-06-09T17:56:51Z) - Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Dialogue Response Ranking Training with Large-Scale Human Feedback Data [52.12342165926226]
We leverage social media feedback data to build a large-scale training dataset for feedback prediction.
We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data.
Our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback.
arXiv Detail & Related papers (2020-09-15T10:50:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.