Related papers: Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation

Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation

URL: http://arxiv.org/abs/2403.06769v3
Date: Sun, 22 Sep 2024 11:34:19 GMT
Title: Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation
Authors: Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua,
Abstract summary: We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users. This poses two main challenges for existing dialogue agents. We propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm.
Score: 69.5677514160986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training strategic planners that can be generalized to diverse users. To address these challenges, we propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm. Through experiments on benchmark non-collaborative dialogue tasks, we demonstrate the effectiveness of Trip in catering to diverse users.

Related papers

Planning with Diffusion Models for Target-Oriented Dialogue Systems [5.079888940901933]
We introduce DiffTOD, a dialogue planning framework for non-sequential dialogue planning. DiffTOD formulates dialogue planning as a trajectory generation problem with conditional guidance. We show that DiffTOD can effectively perform non-myopic lookahead exploration and optimize action strategies over a long horizon.
arXiv Detail & Related papers (2025-04-23T16:27:15Z)
Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning [31.785493263807684]
We present the User-Tailored Dialogue Policy Planning (UDP) framework, which incorporates an Intrinsic User World Model to model user traits and feedback. UDP operates in three stages: (1) User Persona Portraying, using a diffusion model to dynamically infer user profiles; (2) User Feedback Anticipating, leveraging a Brownian Bridge-inspired anticipator to predict user reactions; and (3) User-Tailored Policy Planning, integrating these insights to optimize response strategies.
arXiv Detail & Related papers (2025-04-18T11:48:55Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning. EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting [3.059886686838972]
This study aims to establish human-agent rapport through small talk by using a rapport-building strategy. We implemented this strategy for the virtual agents based on dialogue strategies by prompting a large language model (LLM)
arXiv Detail & Related papers (2024-06-14T08:47:15Z)
Planning Like Human: A Dual-process Framework for Dialogue Planning [31.995557540062553]
We propose the Dual-Process Dialogue Planning framework to enhance dialogue planning in Large Language Models (LLMs) Inspired by the dualprocess theory in psychology, we propose the framework, which embodies two modes of thinking: intuitive (fast) and analytical (slow) Our empirical evaluations affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.
arXiv Detail & Related papers (2024-06-08T06:52:47Z)
Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue [11.338393954848632]
We focus on effective dialogue planning for target-oriented dialogue generation. Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning approach. Our algorithms significantly outperform various baseline models.
arXiv Detail & Related papers (2024-03-10T02:14:24Z)
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data. PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z)
Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs) This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks. Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z)
Investigating Reinforcement Learning for Communication Strategies in a Task-Initiative Setting [8.680676599607123]
We analyze the trade-offs between initial presentation and subsequent followup as a function of user clarification strategy. We find surprising advantages to coherence-based representations of dialogue strategy, which bring minimal data requirements, explainable choices, and strong audit capabilities.
arXiv Detail & Related papers (2023-08-03T00:10:23Z)
Decision-Oriented Dialogue for Human-AI Collaboration [62.367222979251444]
We describe a class of tasks called decision-oriented dialogues, in which AI assistants such as large language models (LMs) must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach.
arXiv Detail & Related papers (2023-05-31T17:50:02Z)
Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning [81.79431311952656]
We propose a novel system MultiESC to provide Emotional Support. For strategy planning, we propose lookaheads to estimate the future user feedback after using particular strategies. For user state modeling, MultiESC focuses on capturing users' subtle emotional expressions and understanding their emotion causes.
arXiv Detail & Related papers (2022-10-09T12:23:47Z)
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [116.804536884437]
We propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy.
arXiv Detail & Related papers (2020-04-21T03:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.