Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System
- URL: http://arxiv.org/abs/2306.09821v2
- Date: Thu, 19 Oct 2023 16:51:06 GMT
- Title: Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System
- Authors: Zhiyuan Hu, Yue Feng, Anh Tuan Luu, Bryan Hooi, Aldo Lipani
- Abstract summary: We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
- Score: 65.93577256431125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue systems and large language models (LLMs) have gained considerable
attention. However, the direct utilization of LLMs as task-oriented dialogue
(TOD) models has been found to underperform compared to smaller task-specific
models. Nonetheless, it is crucial to acknowledge the significant potential of
LLMs and explore improved approaches for leveraging their impressive abilities.
Motivated by the goal of leveraging LLMs, we propose an alternative approach
called User-Guided Response Optimization (UGRO) to combine it with a smaller
TOD model. This approach uses LLM as annotation-free user simulator to assess
dialogue responses, combining them with smaller fine-tuned end-to-end TOD
models. By utilizing the satisfaction feedback generated by LLMs, UGRO further
optimizes the supervised fine-tuned TOD model. Specifically, the TOD model
takes the dialogue history as input and, with the assistance of the user
simulator's feedback, generates high-satisfaction responses that meet the
user's requirements. Through empirical experiments on two TOD benchmarks, we
validate the effectiveness of our method. The results demonstrate that our
approach outperforms previous state-of-the-art (SOTA) results.
Related papers
- Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use [1.2277343096128712]
Large language models (LLMs) have shown promise for scalable educational applications.
Our study explores the use of smaller, more affordable LLMs for one-on-one tutoring in the context of solving reading comprehension problems.
arXiv Detail & Related papers (2024-10-25T00:40:21Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Aligning Large Language Models via Fine-grained Supervision [20.35000061196631]
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations.
Current approaches focus on using reinforcement learning with human feedback to improve model alignment.
We propose a method to enhance LLM alignment through fine-grained token-level supervision.
arXiv Detail & Related papers (2024-06-04T20:21:45Z) - Show, Don't Tell: Aligning Language Models with Demonstrated Feedback [54.10302745921713]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.
We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - SLMRec: Empowering Small Language Models for Sequential Recommendation [38.51895517016953]
Sequential Recommendation task involves predicting the next item a user is likely to interact with, given their past interactions.
Recent research demonstrates the great impact of LLMs on sequential recommendation systems.
Due to the huge size of LLMs, it is inefficient and impractical to apply a LLM-based model in real-world platforms.
arXiv Detail & Related papers (2024-05-28T07:12:06Z) - Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems [2.788542465279969]
This paper introduces DAUS, a Domain-Aware User Simulator.
We fine-tune DAUS on real examples of task-oriented dialogues.
Results on two relevant benchmarks showcase significant improvements in terms of user goal fulfillment.
arXiv Detail & Related papers (2024-02-20T20:57:47Z) - Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization.
Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge.
We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z) - User Simulation with Large Language Models for Evaluating Task-Oriented
Dialogue [10.336443286833145]
We propose a novel user simulator built using recently developed large pretrained language models (LLMs)
Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems.
arXiv Detail & Related papers (2023-09-23T02:04:57Z) - Enhancing Large Language Model Induced Task-Oriented Dialogue Systems
Through Look-Forward Motivated Goals [76.69419538047813]
ProToD approach anticipates the future dialogue actions and incorporates the goal-oriented reward signal to enhance ToD systems.
We present a novel evaluation method that assesses ToD systems based on goal-driven dialogue simulations.
Empirical experiments conducted on the MultiWoZ 2.1 dataset demonstrate that our model can achieve superior performance using only 10% of the data.
arXiv Detail & Related papers (2023-09-16T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.