Related papers: User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

URL: http://arxiv.org/abs/2309.13233v1
Date: Sat, 23 Sep 2023 02:04:57 GMT
Title: User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
Authors: Sam Davidson, Salvatore Romeo, Raphael Shu, James Gung, Arshit Gupta, Saab Mansour, Yi Zhang
Abstract summary: We propose a novel user simulator built using recently developed large pretrained language models (LLMs) Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems.
Score: 10.336443286833145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the major impediments to the development of new task-oriented dialogue (TOD) systems is the need for human evaluation at multiple stages and iterations of the development process. In an effort to move toward automated evaluation of TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). In order to increase the linguistic diversity of our system relative to the related previous work, we do not fine-tune the LLMs used by our system on existing TOD datasets; rather we use in-context learning to prompt the LLMs to generate robust and linguistically diverse output with the goal of simulating the behavior of human interlocutors. Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems. Using this approach, our current simulator is effectively able to interact with several TOD systems, especially on single-intent conversational goals, while generating lexically and syntactically diverse output relative to previous simulators that rely upon fine-tuned models. Finally, we collect a Human2Bot dataset of humans interacting with the same TOD systems with which we experimented in order to better quantify these achievements.

Related papers

Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems [6.8738526619759535]
offline datasets have been used to evaluate task-oriented dialogue (TOD) models. User-agents, which are context-aware, can simulate the variability and unpredictability of human conversations.
arXiv Detail & Related papers (2024-11-15T06:05:45Z)
Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models [16.94819621353007]
SynTOD is a new synthetic data generation approach for developing end-to-end Task-Oriented Dialogue (TOD) systems. It generates diverse, structured conversations through random walks and response simulation using large language models. In our experiments, using graph-guided response simulations leads to significant improvements in intent classification, slot filling and response relevance.
arXiv Detail & Related papers (2024-04-23T06:23:34Z)
Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems [2.788542465279969]
This paper introduces DAUS, a Domain-Aware User Simulator. We fine-tune DAUS on real examples of task-oriented dialogues. Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment.
arXiv Detail & Related papers (2024-02-20T20:57:47Z)
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems [53.94772445896213]
Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. We propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication.
arXiv Detail & Related papers (2024-01-08T15:01:08Z)
DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems. It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Enhancing Large Language Model Induced Task-Oriented Dialogue Systems Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems. We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations. Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z)
Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model. This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models. Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z)
Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator [37.590563896382456]
We propose an interactive evaluation framework for Task-Oriented Dialogue (TOD) systems. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates.
arXiv Detail & Related papers (2022-10-26T07:41:32Z)
SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching [81.45928589522032]
We parameterize modular task-oriented dialog systems using a Transformer-based auto-regressive language model. We pre-train, on heterogeneous dialog corpora, a task-grounded response generation model. Experiments show that SOLOIST creates new state-of-the-art on well-studied task-oriented dialog benchmarks.
arXiv Detail & Related papers (2020-05-11T17:58:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.