DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues
- URL: http://arxiv.org/abs/2405.13028v1
- Date: Thu, 16 May 2024 06:24:31 GMT
- Title: DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues
- Authors: Xiang Luo, Zhiwen Tang, Jin Wang, Xuejie Zhang,
- Abstract summary: This paper introduces DuetSim, a novel framework designed to address the intricate demands of task-oriented dialogues by leveraging large language models.
DuetSim stands apart from conventional approaches by employing two LLMs in tandem: one dedicated to response generation and the other focused on verification.
We validate the efficacy of our method through extensive experiments conducted on the MultiWOZ dataset, highlighting improvements in response quality and correctness.
- Score: 7.765092134290888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although large language models (LLMs) exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when tasked with generating responses that effectively guide users towards their goals, particularly in dialogues with intricate constraints and requirements. This paper introduces DuetSim, a novel framework designed to address the intricate demands of task-oriented dialogues by leveraging LLMs. DuetSim stands apart from conventional approaches by employing two LLMs in tandem: one dedicated to response generation and the other focused on verification. This dual LLM approach empowers DuetSim to produce responses that not only exhibit diversity but also demonstrate accuracy and are preferred by human users. We validate the efficacy of our method through extensive experiments conducted on the MultiWOZ dataset, highlighting improvements in response quality and correctness, largely attributed to the incorporation of the second LLM. Our code is accessible at: https://github.com/suntea233/DuetSim.
Related papers
- Multi-trait User Simulation with Adaptive Decoding for Conversational Task Assistants [11.092533523906834]
This paper introduces Multi-Trait Adaptive Decoding (mTAD), a method that generates diverse user profiles at decoding-time.
By analyzing real-world dialogues from the Conversational Task Assistant domain, we identify key conversational traits.
We develop a framework to generate profile-aware dialogues that enhance conversational diversity.
arXiv Detail & Related papers (2024-10-16T09:40:34Z) - SimulBench: Evaluating Language Models with Creative Simulation Tasks [20.233111652638637]
We introduce SimulBench, a benchmark designed to evaluate large language models (LLMs) across a diverse collection of creative simulation scenarios.
A major challenge is to develop an evaluation framework for testing different LLMs fairly while preserving the multi-round interactive nature of simulation tasks between users and AI.
arXiv Detail & Related papers (2024-09-11T21:53:20Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System.
Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results.
We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z) - Plan-Grounded Large Language Models for Dual Goal Conversational
Settings [7.694972908311347]
Training Large Language Models to follow user instructions has been shown to supply the LLM with ample capacity to converse fluently while being aligned with humans.
Yet, it is not completely clear how an LLM can lead a plan-grounded conversation in mixed-initiative settings.
We propose a novel LLM that grounds the dialogue on a procedural plan, can take the dialogue initiative, and enforces guardrails on the system's behavior.
arXiv Detail & Related papers (2024-02-01T22:56:39Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.