Planning with Large Language Models for Conversational Agents
- URL: http://arxiv.org/abs/2407.03884v1
- Date: Thu, 4 Jul 2024 12:23:02 GMT
- Title: Planning with Large Language Models for Conversational Agents
- Authors: Zhigen Li, Jianxiang Peng, Yanmeng Wang, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong,
- Abstract summary: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs)
We propose a new framework for planning-based conversational agents powered by large language models (LLMs)
Experiment results show that LLMs finetuned on PCA-D can significantly improve the performance and generalize to unseen domains.
- Score: 51.12859325330882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be unified with controllability, proactivity, and low manual annotation. To bridge this gap, we propose a new framework for planning-based conversational agents (PCA) powered by large language models (LLMs), which only requires humans to define tasks and goals for the LLMs. Before conversation, LLM plans the core and necessary SOP for dialogue offline. During the conversation, LLM plans the best action path online referring to the SOP, and generates responses to achieve process controllability. Subsequently, we propose a semi-automatic dialogue data creation framework and curate a high-quality dialogue dataset (PCA-D). Meanwhile, we develop multiple variants and evaluation metrics for PCA, e.g., planning with Monte Carlo Tree Search (PCA-M), which searches for the optimal dialogue action while satisfying SOP constraints and achieving the proactive of the dialogue. Experiment results show that LLMs finetuned on PCA-D can significantly improve the performance and generalize to unseen domains. PCA-M outperforms other CoT and ToT baselines in terms of conversation controllability, proactivity, task success rate, and overall logical coherence, and is applicable in industry dialogue scenarios. The dataset and codes are available at XXXX.
Related papers
- Unsupervised Extraction of Dialogue Policies from Conversations [3.102576158218633]
We show how Large Language Models can be instrumental in extracting dialogue policies from datasets.
We then propose a novel method for generating dialogue policies utilizing a controllable and interpretable graph-based methodology.
arXiv Detail & Related papers (2024-06-21T14:57:25Z) - Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [33.57497419019826]
Action-Based Contrastive Self-Training allows for sample-efficient dialogue policy learning in multi-turn conversation.
ACT demonstrates substantial conversation modeling improvements over standard approaches to supervised fine-tuning and DPO.
arXiv Detail & Related papers (2024-05-31T22:44:48Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - Leveraging Explicit Procedural Instructions for Data-Efficient Action
Prediction [5.448684866061922]
Task-oriented dialogues often require agents to enact complex, multi-step procedures in order to meet user requests.
Large language models have found success automating these dialogues in constrained environments, but their widespread deployment is limited by the substantial quantities of task-specific data required for training.
This paper presents a data-efficient solution to constructing dialogue systems, leveraging explicit instructions derived from agent guidelines.
arXiv Detail & Related papers (2023-06-06T18:42:08Z) - Prompting and Evaluating Large Language Models for Proactive Dialogues:
Clarification, Target-guided, and Non-collaboration [72.04629217161656]
This work focuses on three aspects of proactive dialogue systems: clarification, target-guided, and non-collaborative dialogues.
To trigger the proactivity of LLMs, we propose the Proactive Chain-of-Thought prompting scheme.
arXiv Detail & Related papers (2023-05-23T02:49:35Z) - Controllable Mixed-Initiative Dialogue Generation through Prompting [50.03458333265885]
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control.
Agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner.
Standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents.
We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation.
arXiv Detail & Related papers (2023-05-06T23:11:25Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.