Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems
- URL: http://arxiv.org/abs/2204.00763v5
- Date: Thu, 2 Nov 2023 18:59:38 GMT
- Title: Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems
- Authors: Weiwei Sun and Shuyu Guo and Shuo Zhang and Pengjie Ren and Zhumin
Chen and Maarten de Rijke and Zhaochun Ren
- Abstract summary: Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
- Score: 80.77917437785773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task-oriented dialogue systems (TDSs) are assessed mainly in an offline
setting or through human evaluation. The evaluation is often limited to
single-turn or is very time-intensive. As an alternative, user simulators that
mimic user behavior allow us to consider a broad set of user goals to generate
human-like conversations for simulated evaluation. Employing existing user
simulators to evaluate TDSs is challenging as user simulators are primarily
designed to optimize dialogue policies for TDSs and have limited evaluation
capabilities. Moreover, the evaluation of user simulators is an open challenge.
In this work, we propose a metaphorical user simulator for end-to-end TDS
evaluation, where we define a simulator to be metaphorical if it simulates
user's analogical thinking in interactions with systems. We also propose a
tester-based evaluation framework to generate variants, i.e., dialogue systems
with different capabilities. Our user simulator constructs a metaphorical user
model that assists the simulator in reasoning by referring to prior knowledge
when encountering new items. We estimate the quality of simulators by checking
the simulated interactions between simulators and variants. Our experiments are
conducted using three TDS datasets. The proposed user simulator demonstrates
better consistency with manual evaluation than an agenda-based simulator and a
seq2seq model on three datasets; our tester framework demonstrates efficiency
and has been tested on multiple tasks, such as conversational recommendation
and e-commerce dialogues.
Related papers
- Promptable Closed-loop Traffic Simulation [57.36568236100507]
ProSim is a multimodal promptable closed-loop traffic simulation framework.
ProSim rolls out a traffic scenario in a closed-loop manner, modeling each agent's interaction with other traffic participants.
To support research on promptable traffic simulation, we create ProSim-Instruct-520k, a multimodal prompt-scenario paired driving dataset.
arXiv Detail & Related papers (2024-09-09T17:59:15Z) - A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems [14.646529557978512]
Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences.
Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities.
We introduce a Controllable, scalable, and human-Involved (CSHI) simulator framework that manages the behavior of user simulators.
arXiv Detail & Related papers (2024-05-13T03:02:56Z) - How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation [14.646529557978512]
We analyze the limitations of using Large Language Models in constructing user simulators for Conversational Recommender System.
Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results.
We propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items.
arXiv Detail & Related papers (2024-03-25T04:21:06Z) - Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems [2.788542465279969]
This paper introduces DAUS, a Domain-Aware User Simulator.
We fine-tune DAUS on real examples of task-oriented dialogues.
Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment.
arXiv Detail & Related papers (2024-02-20T20:57:47Z) - UniSim: A Neural Closed-Loop Sensor Simulator [76.79818601389992]
We present UniSim, a neural sensor simulator that takes a single recorded log captured by a sensor-equipped vehicle.
UniSim builds neural feature grids to reconstruct both the static background and dynamic actors in the scene.
We incorporate learnable priors for dynamic objects, and leverage a convolutional network to complete unseen regions.
arXiv Detail & Related papers (2023-08-03T17:56:06Z) - Adversarial learning of neural user simulators for dialogue policy
optimisation [14.257597015289512]
Reinforcement learning based dialogue policies are typically trained in interaction with a user simulator.
Current data-driven simulators are trained to accurately model the user behaviour in a dialogue corpus.
We propose an alternative method using adversarial learning, with the aim to simulate realistic user behaviour with more variation.
arXiv Detail & Related papers (2023-06-01T16:17:16Z) - Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with
User Simulator [37.590563896382456]
We propose an interactive evaluation framework for Task-Oriented Dialogue (TOD) systems.
We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues.
Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates.
arXiv Detail & Related papers (2022-10-26T07:41:32Z) - Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z) - HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot
Object Handovers [60.45158007016316]
"HandoverSim" is a simulation benchmark for human-to-robot object handovers.
We leverage a recent motion capture dataset of hand grasping of objects.
We create training and evaluation environments for the receiver with standardized protocols and metrics.
arXiv Detail & Related papers (2022-05-19T17:59:00Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.