Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
- URL: http://arxiv.org/abs/2401.05033v1
- Date: Wed, 10 Jan 2024 09:49:10 GMT
- Title: Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
- Authors: Dennis Ulmer, Elman Mansimov, Kaixiang Lin, Justin Sun, Xibin Gao, Yi
Zhang
- Abstract summary: Large language models (LLMs) are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging.
We propose a more effective method for data collection through LLMs engaging in a conversation in various roles.
This approach generates a training data via "self-talk" of LLMs that can be refined and utilized for supervised fine-tuning.
- Score: 11.706292228586332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are powerful dialogue agents, but specializing
them towards fulfilling a specific function can be challenging. Instructing
tuning, i.e. tuning models on instruction and sample responses generated by
humans (Ouyang et al., 2022), has proven as an effective method to do so, yet
requires a number of data samples that a) might not be available or b) costly
to generate. Furthermore, this cost increases when the goal is to make the LLM
follow a specific workflow within a dialogue instead of single instructions.
Inspired by the self-play technique in reinforcement learning and the use of
LLMs to simulate human agents, we propose a more effective method for data
collection through LLMs engaging in a conversation in various roles. This
approach generates a training data via "self-talk" of LLMs that can be refined
and utilized for supervised fine-tuning. We introduce an automated way to
measure the (partial) success of a dialogue. This metric is used to filter the
generated conversational data that is fed back in LLM for training. Based on
our automated and human evaluations of conversation quality, we demonstrate
that such self-talk data improves results. In addition, we examine the various
characteristics that showcase the quality of generated dialogues and how they
can be connected to their potential utility as training data.
Related papers
- Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - Automated test generation to evaluate tool-augmented LLMs as conversational AI agents [0.27309692684728615]
We present a test generation pipeline to evaluate conversational AI agents.
Our framework uses LLMs to generate diverse tests grounded on user-defined procedures.
Our results show that while tool-augmented LLMs perform well in single interactions, they often struggle to handle complete conversations.
arXiv Detail & Related papers (2024-09-24T09:57:43Z) - Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue [25.89926022671521]
We generate a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset.
We find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties.
arXiv Detail & Related papers (2024-09-12T18:00:18Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs [12.615896145500393]
Self-prompt tuned LLMs can automatically generate expert role prompts for any given question.
We extensively evaluate self-prompt tuned LLMs on widely used NLP benchmarks and open-ended question test.
arXiv Detail & Related papers (2024-07-12T05:26:24Z) - Show, Don't Tell: Aligning Language Models with Demonstrated Feedback [54.10302745921713]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.
We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - Let the LLMs Talk: Simulating Human-to-Human Conversational QA via
Zero-Shot LLM-to-LLM Interactions [19.365615476223635]
Conversational question-answering systems aim to create interactive search systems that retrieve information by interacting with users.
Existing work uses human annotators to play the roles of the questioner (student) and the answerer (teacher)
We propose a simulation framework that employs zero-shot learner LLMs for simulating teacher-student interactions.
arXiv Detail & Related papers (2023-12-05T17:38:02Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.