Related papers: Planning Like Human: A Dual-process Framework for Dialogue Planning

Planning Like Human: A Dual-process Framework for Dialogue Planning

URL: http://arxiv.org/abs/2406.05374v1
Date: Sat, 8 Jun 2024 06:52:47 GMT
Title: Planning Like Human: A Dual-process Framework for Dialogue Planning
Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Ming Liu, Zerui Chen, Bing Qin,
Abstract summary: We propose the Dual-Process Dialogue Planning framework to enhance dialogue planning in Large Language Models (LLMs) Inspired by the dualprocess theory in psychology, we propose the framework, which embodies two modes of thinking: intuitive (fast) and analytical (slow) Our empirical evaluations affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.
Score: 31.995557540062553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to their reactive nature. Traditional approaches to enhance dialogue planning in LLMs, ranging from elaborate prompt engineering to the integration of policy networks, either face efficiency issues or deliver suboptimal performance. Inspired by the dualprocess theory in psychology, which identifies two distinct modes of thinking - intuitive (fast) and analytical (slow), we propose the Dual-Process Dialogue Planning (DPDP) framework. DPDP embodies this theory through two complementary planning systems: an instinctive policy model for familiar contexts and a deliberative Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. This dual strategy is further coupled with a novel two-stage training regimen: offline Reinforcement Learning for robust initial policy model formation followed by MCTS-enhanced on-the-fly learning, which ensures a dynamic balance between efficiency and strategic depth. Our empirical evaluations across diverse dialogue tasks affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.

Related papers

A General Highly Accurate Online Planning Method Integrating Large Language Models into Nested Rollout Policy Adaptation for Dialogue Tasks [16.400192943577743]
In goal-oriented dialogue tasks, the main challenge is to steer the interaction towards a given goal within a limited number of turns.<n>Existing approaches either rely on elaborate prompt engineering, or integrate policy networks and pre-trained policy models.<n>We present Nested Rollout Policy Adaptation for Goal-oriented Dialogue (NRPA-GD), a novel dialogue policy planning method.
arXiv Detail & Related papers (2025-11-17T02:48:37Z)
PRINCIPLES: Synthetic Strategy Memory for Proactive Dialogue Agents [16.819463022406627]
We propose PRINCIPLES: a synthetic strategy memory for proactive dialogue agents.<n> PRINCIPLES is derived through offline self-play simulations and serves as reusable knowledge that guides strategy planning.<n>We evaluate PRINCIPLES in both emotional support and persuasion domains, demonstrating consistent improvements over strong baselines.
arXiv Detail & Related papers (2025-09-22T07:53:59Z)
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning [75.04643265875072]
Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking.<n>Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization.<n>ACPO enables LRMs to achieve efficient reasoning through adaptive cognitive allocation and dynamic system switch.
arXiv Detail & Related papers (2025-05-22T07:15:08Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization [3.674552982566341]
2D-Curri-DPO is a novel framework employing a two-dimensional curriculum that jointly models Prompt Complexity (PC) and Pairwise Distinguishability. Our approach achieves state-of-the-art performance on challenging test sets like UltraFeedback.
arXiv Detail & Related papers (2025-04-10T15:32:00Z)
Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues [31.92843134331582]
We introduce a novel dialogue policy planning framework, LDPP. It fully automates the process from mining policies in dialogue records to learning policy planning. Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios.
arXiv Detail & Related papers (2024-12-19T07:06:01Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation [69.5677514160986]
We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users. This poses two main challenges for existing dialogue agents. We propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm.
arXiv Detail & Related papers (2024-03-11T14:38:16Z)
Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue [11.338393954848632]
We focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning approach. Our algorithms significantly outperform various baseline models.
arXiv Detail & Related papers (2024-03-10T02:14:24Z)
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data. PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z)
Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs) This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks. Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z)
JoTR: A Joint Transformer and Reinforcement Learning Framework for Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling. We introduce a novel framework, JoTR, to generate flexible dialogue actions. Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z)
Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction [34.25242109800481]
This paper presents our solution to Track 2 of Intent Induction from Conversations for Task-Oriented Dialogue at the Eleventh Dialogue System Technology Challenge (DSTC11) The essence of intention clustering lies in distinguishing the representation of different dialogue utterances. In the released DSTC11 evaluation results, our proposed system ranked first on both of the two subtasks of this Track.
arXiv Detail & Related papers (2023-03-09T04:51:27Z)
Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System [49.39150449455407]
HDNO is an option framework for designing latent dialogue acts to avoid designing specific dialogue act representations. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA.
arXiv Detail & Related papers (2020-06-11T20:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.