Related papers: Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems

Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems

URL: http://arxiv.org/abs/2402.13374v1
Date: Tue, 20 Feb 2024 20:57:47 GMT
Title: Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems
Authors: Ivan Sekuli\'c, Silvia Terragni, Victor Guimar\~aes, Nghia Khau, Bruna Guedes, Modestas Filipavicius, Andr\'e Ferreira Manso, Roland Mathis
Abstract summary: This paper introduces DAUS, a Domain-Aware User Simulator. We fine-tune DAUS on real examples of task-oriented dialogues. Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment.
Score: 2.788542465279969
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the realm of dialogue systems, user simulation techniques have emerged as a game-changer, redefining the evaluation and enhancement of task-oriented dialogue (TOD) systems. These methods are crucial for replicating real user interactions, enabling applications like synthetic data augmentation, error detection, and robust evaluation. However, existing approaches often rely on rigid rule-based methods or on annotated data. This paper introduces DAUS, a Domain-Aware User Simulator. Leveraging large language models, we fine-tune DAUS on real examples of task-oriented dialogues. Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment. Notably, we have observed that fine-tuning enhances the simulator's coherence with user goals, effectively mitigating hallucinations -- a major source of inconsistencies in simulator responses.

Related papers

RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing [111.06936588273868]
RMTBench is a comprehensive textbfuser-centric bilingual role-playing benchmark featuring 80 diverse characters and over 8,000 dialogue rounds.<n>Our benchmark constructs dialogues based on explicit user motivations rather than character descriptions, ensuring alignment with practical user applications.<n>By shifting focus from character background to user intention fulfillment, RMTBench bridges the gap between academic evaluation and practical deployment requirements.
arXiv Detail & Related papers (2025-07-27T16:49:47Z)
Goal Alignment in LLM-Based User Simulators for Conversational AI [14.771856490513194]
User simulators are essential to conversational AI, enabling scalable agent development and evaluation through simulated interactions.<n>We introduce User Goal State Tracking (U GST), a novel framework that tracks user goal progression throughout conversations.<n>We present a three-stage methodology for developing user simulators that can autonomously track goal progression and reason to generate goal-aligned responses.
arXiv Detail & Related papers (2025-07-27T07:07:12Z)
Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations. We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z)
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles [37.43150003866563]
We introduce the User Simulator with Implicit Profiles (USP), a framework that infers implicit user profiles from human-machine interactions to simulate personalized and realistic dialogues.<n>USP outperforms strong baselines in terms of authenticity and diversity while maintaining comparable consistency.
arXiv Detail & Related papers (2025-02-26T09:26:54Z)
Dynamic benchmarking framework for LLM-based conversational data capture [0.0]
This paper introduces a benchmarking framework to assess large language models (LLMs) It integrates generative agent simulation to evaluate performance on key dimensions: information extraction, context awareness, and adaptive engagement. Results show that adaptive strategies improve data extraction accuracy, especially when handling ambiguous responses.
arXiv Detail & Related papers (2025-02-04T15:47:47Z)
Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems [6.8738526619759535]
offline datasets have been used to evaluate task-oriented dialogue (TOD) models. User-agents, which are context-aware, can simulate the variability and unpredictability of human conversations.
arXiv Detail & Related papers (2024-11-15T06:05:45Z)
Unlocking the Potential of User Feedback: Leveraging Large Language Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model. This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models. Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z)
In-Context Learning User Simulators for Task-Oriented Dialog Systems [1.7086737326992172]
This paper presents a novel application of large language models in user simulation for task-oriented dialog systems. By harnessing the power of these models, the proposed approach generates diverse utterances based on user goals and limited dialog examples.
arXiv Detail & Related papers (2023-06-01T15:06:11Z)
Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator [37.590563896382456]
We propose an interactive evaluation framework for Task-Oriented Dialogue (TOD) systems. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates.
arXiv Detail & Related papers (2022-10-26T07:41:32Z)
Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation. We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems. We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation [114.48767388174218]
This paper presents an empirical analysis on different types of dialog systems composed of different modules in different settings. Our results show that a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels.
arXiv Detail & Related papers (2020-05-15T05:20:06Z)
Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents. Two agents interact with each other and are jointly learned simultaneously. Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.