Adaptive Dialog Policy Learning with Hindsight and User Modeling
- URL: http://arxiv.org/abs/2005.03299v1
- Date: Thu, 7 May 2020 07:43:43 GMT
- Title: Adaptive Dialog Policy Learning with Hindsight and User Modeling
- Authors: Yan Cao, Keting Lu, Xiaoping Chen, Shiqi Zhang
- Abstract summary: We develop algorithm LHUA that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users.
Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature.
- Score: 10.088347529930129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning methods have been used to compute dialog policies from
language-based interaction experiences. Efficiency is of particular importance
in dialog policy learning, because of the considerable cost of interacting with
people, and the very poor user experience from low-quality conversations.
Aiming at improving the efficiency of dialog policy learning, we develop
algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that,
for the first time, enables dialog agents to adaptively learn with hindsight
from both simulated and real users. Simulation and hindsight provide the dialog
agent with more experience and more (positive) reinforcements respectively.
Experimental results suggest that, in success rate and policy quality, LHUA
outperforms competitive baselines from the literature, including its
no-simulation, no-adaptation, and no-hindsight counterparts.
Related papers
- Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Data Augmentation Integrating Dialogue Flow and Style to Adapt Spoken Dialogue Systems to Low-Resource User Groups [1.7725414095035827]
This study addresses the interaction challenges encountered by spoken dialogue systems (SDSs) when engaging with users who exhibit distinct conversational behaviors.
We propose a novel data augmentation framework to enhance SDS performance for user groups with limited resources.
arXiv Detail & Related papers (2024-08-20T03:33:04Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - Improving Conversational Recommendation Systems via Counterfactual Data
Simulation [73.4526400381668]
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations.
Existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data.
We propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs.
arXiv Detail & Related papers (2023-06-05T12:48:56Z) - Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z) - Few-Shot Structured Policy Learning for Multi-Domain and Multi-Task
Dialogues [0.716879432974126]
Graph neural networks (GNNs) show a remarkable superiority by reaching a success rate above 80% with only 50 dialogues, when learning from simulated experts.
We suggest to concentrate future research efforts on bridging the gap between human data, simulators and automatic evaluators in dialogue frameworks.
arXiv Detail & Related papers (2023-02-22T08:18:49Z) - What Does The User Want? Information Gain for Hierarchical Dialogue
Policy Optimisation [3.1433893853959605]
optimisation via reinforcement learning (RL) is susceptible to sample inefficiency and instability.
We propose the usage of an intrinsic reward based on information gain to address this issue.
Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework.
arXiv Detail & Related papers (2021-09-15T07:21:26Z) - Automatic Curriculum Learning With Over-repetition Penalty for Dialogue
Policy Learning [8.744026064255337]
We propose a novel framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), to realize the dialogue policy for automatic curriculum learning.
The teacher model arranges a meaningful ordered curriculum and automatically adjusts it by monitoring the learning progress of the dialogue agent.
Experiments show that the ACL-DQN significantly improves the effectiveness and stability of dialogue tasks with a statistically significant margin.
arXiv Detail & Related papers (2020-12-28T02:44:49Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.