Multi-Action Dialog Policy Learning from Logged User Feedback
- URL: http://arxiv.org/abs/2302.13505v1
- Date: Mon, 27 Feb 2023 04:01:28 GMT
- Title: Multi-Action Dialog Policy Learning from Logged User Feedback
- Authors: Shuo Zhang, Junzhou Zhao, Pinghui Wang, Tianxiang Wang, Zi Liang, Jing
Tao, Yi Huang, Junlan Feng
- Abstract summary: Multi-action dialog policy generates multiple atomic dialog actions per turn.
Due to data limitations, existing policy models generalize poorly toward unseen dialog flows.
We propose BanditMatch to improve multi-action dialog policy learning with explicit and implicit turn-level user feedback.
- Score: 28.4271696269512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-action dialog policy, which generates multiple atomic dialog actions
per turn, has been widely applied in task-oriented dialog systems to provide
expressive and efficient system responses. Existing policy models usually
imitate action combinations from the labeled multi-action dialog examples. Due
to data limitations, they generalize poorly toward unseen dialog flows. While
reinforcement learning-based methods are proposed to incorporate the service
ratings from real users and user simulators as external supervision signals,
they suffer from sparse and less credible dialog-level rewards. To cope with
this problem, we explore to improve multi-action dialog policy learning with
explicit and implicit turn-level user feedback received for historical
predictions (i.e., logged user feedback) that are cost-efficient to collect and
faithful to real-world scenarios. The task is challenging since the logged user
feedback provides only partial label feedback limited to the particular
historical dialog actions predicted by the agent. To fully exploit such
feedback information, we propose BanditMatch, which addresses the task from a
feedback-enhanced semi-supervised learning perspective with a hybrid objective
of semi-supervised learning and bandit learning. BanditMatch integrates
pseudo-labeling methods to better explore the action space through constructing
full label feedback. Extensive experiments show that our BanditMatch
outperforms the state-of-the-art methods by generating more concise and
informative responses. The source code and the appendix of this paper can be
obtained from https://github.com/ShuoZhangXJTU/BanditMatch.
Related papers
- In-Context Learning User Simulators for Task-Oriented Dialog Systems [1.7086737326992172]
This paper presents a novel application of large language models in user simulation for task-oriented dialog systems.
By harnessing the power of these models, the proposed approach generates diverse utterances based on user goals and limited dialog examples.
arXiv Detail & Related papers (2023-06-01T15:06:11Z) - SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for
Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora.
Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z) - "Think Before You Speak": Improving Multi-Action Dialog Policy by
Planning Single-Action Dialogs [33.78889030078026]
Multi-action dialog policy (MADP) generates multiple atomic dialog actions per turn.
We propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics.
Our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-25T07:55:53Z) - HERALD: An Annotation Efficient Method to Detect User Disengagement in
Social Conversations [38.95985439093335]
Existing work on detecting user disengagement typically requires hand-labeling many dialog samples.
We propose HERALD, an efficient annotation framework that reframes the training data annotation process as a denoising problem.
Our experiments show that HERALD improves annotation efficiency significantly and achieves 86% user disengagement detection accuracy in two dialog corpora.
arXiv Detail & Related papers (2021-06-01T01:09:55Z) - Alexa Conversations: An Extensible Data-driven Approach for Building
Task-oriented Dialogue Systems [21.98135285833616]
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation.
We present a new approach for building goal-oriented dialogue systems that is scalable, as well as data efficient.
arXiv Detail & Related papers (2021-04-19T07:09:27Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - Reasoning in Dialog: Improving Response Generation by Context Reading
Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences.
We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z) - Dialog Simulation with Realistic Variations for Training Goal-Oriented
Conversational Systems [14.206866126142002]
Goal-oriented dialog systems enable users to complete specific goals like requesting information about a movie or booking a ticket.
We propose an approach for automatically creating a large corpus of annotated dialogs from a few thoroughly annotated sample dialogs and the dialog schema.
We achieve 18? 50% relative accuracy on a held-out test set compared to a baseline dialog generation approach.
arXiv Detail & Related papers (2020-11-16T19:39:15Z) - Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z) - Towards Conversational Recommendation over Multi-Type Dialogs [78.52354759386296]
We propose a new task of conversational recommendation over multi-type dialogs, where the bots can proactively and naturally lead a conversation from a non-recommendation dialog to a recommendation dialog.
To facilitate the study of this task, we create a human-to-human Chinese dialog dataset emphDuRecDial (about 10k dialogs, 156k utterances)
In each dialog, the recommender proactively leads a multi-type dialog to approach recommendation targets and then makes multiple recommendations with rich interaction behavior.
arXiv Detail & Related papers (2020-05-08T11:01:21Z) - Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward
Decomposition [64.06167416127386]
We propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents.
Two agents interact with each other and are jointly learned simultaneously.
Results show that our method can successfully build a system policy and a user policy simultaneously.
arXiv Detail & Related papers (2020-04-08T04:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.