Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy
Planning
- URL: http://arxiv.org/abs/2305.13660v2
- Date: Thu, 19 Oct 2023 22:31:18 GMT
- Title: Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy
Planning
- Authors: Xiao Yu, Maximillian Chen, Zhou Yu
- Abstract summary: GDP-Zero is an approach using Open-Loop MCTS to perform goal-oriented dialogue policy planning without any model training.
We evaluate GDP-Zero on the goal-oriented task PersuasionForGood, and find that its responses are preferred over ChatGPT up to 59.32% of the time.
- Score: 22.753613264491918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning for goal-oriented dialogue often requires simulating future dialogue
interactions and estimating task progress. Many approaches thus consider
training neural networks to perform look-ahead search algorithms such as A*
search and Monte Carlo Tree Search (MCTS). However, this training often
requires abundant annotated data, which creates challenges when faced with
noisy annotations or low-resource settings. We introduce GDP-Zero, an approach
using Open-Loop MCTS to perform goal-oriented dialogue policy planning without
any model training. GDP-Zero prompts a large language model to act as a policy
prior, value function, user simulator, and system model during the tree search.
We evaluate GDP-Zero on the goal-oriented task PersuasionForGood, and find that
its responses are preferred over ChatGPT up to 59.32% of the time, and are
rated more persuasive than ChatGPT during interactive evaluations.
Related papers
- Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models [52.7201882529976]
Planning-based Conversational Agents (PCA) is a dialogue framework aimed at enhancing controllability of LLM-driven agents.
We propose a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o.
We also propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes Monte Carlo Tree Search for optimal action planning during dialogues.
arXiv Detail & Related papers (2024-07-04T12:23:02Z) - Response Enhanced Semi-supervised Dialogue Query Generation [40.17161986495854]
We propose a semi-supervised learning framework -- SemiDQG -- to improve model performance with unlabeled conversations.
We first apply a similarity-based query selection strategy to select high-quality RA-generated pseudo queries.
We adopt the REINFORCE algorithm to further enhance QP, with RA-provided rewards as fine-grained training signals.
arXiv Detail & Related papers (2023-12-20T02:19:54Z) - A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding [55.37338324658501]
Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data.
In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks.
arXiv Detail & Related papers (2023-04-09T15:28:36Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - KILDST: Effective Knowledge-Integrated Learning for Dialogue State
Tracking using Gazetteer and Speaker Information [3.342637296393915]
Dialogue State Tracking (DST) is core research in dialogue systems and has received much attention.
It is necessary to define a new problem that can deal with dialogue between users as a step toward the conversational AI that extracts and recommends information from the dialogue between users.
We introduce a new task - DST from dialogue between users about scheduling an event (DST-S)
The DST-S task is much more challenging since it requires the model to understand and track dialogue in the dialogue between users and to understand who suggested the schedule and who agreed to the proposed schedule.
arXiv Detail & Related papers (2023-01-18T07:11:56Z) - Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with
User Simulator [37.590563896382456]
We propose an interactive evaluation framework for Task-Oriented Dialogue (TOD) systems.
We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues.
Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates.
arXiv Detail & Related papers (2022-10-26T07:41:32Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI.
The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL)
We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z) - Modelling Hierarchical Structure between Dialogue Policy and Natural
Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations.
We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.