Recipes for building an open-domain chatbot
- URL: http://arxiv.org/abs/2004.13637v2
- Date: Thu, 30 Apr 2020 15:36:52 GMT
- Title: Recipes for building an open-domain chatbot
- Authors: Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson,
Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau,
and Jason Weston
- Abstract summary: Good conversation requires engaging talking points and listening to their partners, and displaying knowledge, empathy and personality appropriately.
We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy.
We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available.
- Score: 44.75975649076827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building open-domain chatbots is a challenging area for machine learning
research. While prior work has shown that scaling neural models in the number
of parameters and the size of the data they are trained on gives improved
results, we show that other ingredients are important for a high-performing
chatbot. Good conversation requires a number of skills that an expert
conversationalist blends in a seamless way: providing engaging talking points
and listening to their partners, and displaying knowledge, empathy and
personality appropriately, while maintaining a consistent persona. We show that
large scale models can learn these skills when given appropriate training data
and choice of generation strategy. We build variants of these recipes with 90M,
2.7B and 9.4B parameter models, and make our models and code publicly
available. Human evaluations show our best models are superior to existing
approaches in multi-turn dialogue in terms of engagingness and humanness
measurements. We then discuss the limitations of this work by analyzing failure
cases of our models.
Related papers
- CoDi: Conversational Distillation for Grounded Question Answering [10.265241619616676]
We introduce a novel data distillation framework named CoDi.
CoDi allows us to synthesize large-scale, assistant-style datasets in a steerable and diverse manner.
We show that SLMs trained with CoDi-synthesized data achieve performance comparable to models trained on human-annotated data in standard metrics.
arXiv Detail & Related papers (2024-08-20T22:35:47Z) - Enhancing Chat Language Models by Scaling High-quality Instructional
Conversations [91.98516412612739]
We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat.
Our objective is to capture the breadth of interactions that a human might have with an AI assistant.
We fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA.
arXiv Detail & Related papers (2023-05-23T16:49:14Z) - Estimating the Personality of White-Box Language Models [0.589889361990138]
Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere.
Existing research shows that these models can and do capture human biases.
Many of these biases, especially those that could potentially cause harm, are being well-investigated.
However, studies that infer and change human personality traits inherited by these models have been scarce or non-existent.
arXiv Detail & Related papers (2022-04-25T23:53:53Z) - A Model-Agnostic Data Manipulation Method for Persona-based Dialogue
Generation [107.82729587882397]
It is expensive to scale up current persona-based dialogue datasets.
Each data sample in this task is more complex to learn with than conventional dialogue data.
We propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model.
arXiv Detail & Related papers (2022-04-21T03:49:54Z) - ValueNet: A New Dataset for Human Value Driven Dialogue System [103.2044265617704]
We present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios.
Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks.
ValueNet is the first large-scale text dataset for human value modeling.
arXiv Detail & Related papers (2021-12-12T23:02:52Z) - Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI.
The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL)
We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z) - Low-Resource Adaptation of Open-Domain Generative Chatbots [0.0]
We show that low parameter models can retain their general knowledge conversational abilities while improving in a specific domain.
We propose a generic framework that accounts for variety in question types, tracks reference throughout multi-turn conversations, and removes inconsistent and potentially toxic responses.
Our framework seamlessly transitions between chatting and performing transactional tasks, which will ultimately make interactions with digital assistants more human-like.
arXiv Detail & Related papers (2021-08-13T17:40:30Z) - Multi-Modal Open-Domain Dialogue [28.69395893943413]
Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling.
We investigate combining components from state-of-the-art open-domain dialogue agents with those from state-of-the-art vision models.
We show that our best resulting model outperforms strong existing models in multi-modal dialogue while simultaneously performing as well as its predecessor.
arXiv Detail & Related papers (2020-10-02T16:20:39Z) - Low-Resource Knowledge-Grounded Dialogue Generation [74.09352261943913]
We consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available.
We devise a disentangled response decoder in order to isolate parameters that depend on knowledge-grounded dialogues from the entire generation model.
With only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge.
arXiv Detail & Related papers (2020-02-24T16:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.