Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness
- URL: http://arxiv.org/abs/2004.09731v1
- Date: Tue, 21 Apr 2020 03:13:44 GMT
- Title: Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness
- Authors: Zheng Zhang, Lizi Liao, Xiaoyan Zhu, Tat-Seng Chua, Zitao Liu, Yan
Huang, Minlie Huang
- Abstract summary: We propose an opposite behavior aware framework for policy learning in goal-oriented dialogues.
We estimate the opposite agent's policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy.
- Score: 116.804536884437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing approaches for goal-oriented dialogue policy learning used
reinforcement learning, which focuses on the target agent policy and simply
treat the opposite agent policy as part of the environment. While in real-world
scenarios, the behavior of an opposite agent often exhibits certain patterns or
underlies hidden policies, which can be inferred and utilized by the target
agent to facilitate its own decision making. This strategy is common in human
mental simulation by first imaging a specific action and the probable results
before really acting it. We therefore propose an opposite behavior aware
framework for policy learning in goal-oriented dialogues. We estimate the
opposite agent's policy from its behavior and use this estimation to improve
the target agent by regarding it as part of the target policy. We evaluate our
model on both cooperative and competitive dialogue tasks, showing superior
performance over state-of-the-art baselines.
Related papers
- Learning Control Policies for Variable Objectives from Offline Data [2.7174376960271154]
We introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP)
We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime.
arXiv Detail & Related papers (2023-08-11T13:33:59Z) - On the Value of Myopic Behavior in Policy Reuse [67.37788288093299]
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
In this work, we present a framework called Selective Myopic bEhavior Control(SMEC)
SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions.
arXiv Detail & Related papers (2023-05-28T03:59:37Z) - Imitating Opponent to Win: Adversarial Policy Imitation Learning in
Two-player Competitive Games [0.0]
adversarial policies adopted by an adversary agent can influence a target RL agent to perform poorly in a multi-agent environment.
In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent.
We design a new effective adversarial policy learning algorithm that overcomes this shortcoming.
arXiv Detail & Related papers (2022-10-30T18:32:02Z) - Interacting with Non-Cooperative User: A New Paradigm for Proactive
Dialogue Policy [83.61404191470126]
We propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting.
Specifically, we learn the trade-off via a learned goal weight, which consists of four factors.
The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
arXiv Detail & Related papers (2022-04-07T14:11:31Z) - Influencing Long-Term Behavior in Multiagent Reinforcement Learning [59.98329270954098]
We propose a principled framework for considering the limiting policies of other agents as the time approaches infinity.
Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will take on.
Thanks to our farsighted evaluation, we demonstrate better long-term performance than state-of-the-art baselines in various domains.
arXiv Detail & Related papers (2022-03-07T17:32:35Z) - Contrastive Explanations for Comparing Preferences of Reinforcement
Learning Agents [16.605295052893986]
In complex tasks where the reward function is not straightforward, multiple reinforcement learning (RL) policies can be trained by adjusting the impact of individual objectives on reward function.
In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives.
We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents.
arXiv Detail & Related papers (2021-12-17T11:57:57Z) - Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy.
We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance.
Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Informative Policy Representations in Multi-Agent Reinforcement Learning
via Joint-Action Distributions [17.129962954873587]
In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents' actions posed significant difficulties for an agent to learn a good policy independently.
We propose a general method to learn representations of other agents' policies via the joint-action distributions sampled in interactions.
We empirically demonstrate that our method outperforms existing work in multi-agent tasks when facing unseen agents.
arXiv Detail & Related papers (2021-06-10T15:09:33Z) - Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning [9.014110264448371]
We propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM)
GPIM jointly learns both an abstract-level policy and a goal-conditioned policy.
Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method.
arXiv Detail & Related papers (2021-04-11T16:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.