Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion
Dialogues via Reinforcement Learning and Human Demonstration
- URL: http://arxiv.org/abs/2012.15375v1
- Date: Thu, 31 Dec 2020 00:02:51 GMT
- Title: Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion
Dialogues via Reinforcement Learning and Human Demonstration
- Authors: Weiyan Shi, Yu Li, Saurav Sahay, Zhou Yu
- Abstract summary: We propose to apply reinforcement learning to refine an MLE-based language model without user simulators.
We distill sentence-level information about repetition, inconsistency and task relevance through rewards.
Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results.
- Score: 45.14559188965439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent success of large-scale language models on various
downstream NLP tasks, the repetition and inconsistency problems still persist
in dialogue response generation. Previous approaches have attempted to avoid
repetition by penalizing the language model's undesirable behaviors in the loss
function. However, these methods focus on token-level information and can lead
to incoherent responses and uninterpretable behaviors. To alleviate these
issues, we propose to apply reinforcement learning to refine an MLE-based
language model without user simulators, and distill sentence-level information
about repetition, inconsistency and task relevance through rewards. In
addition, to better accomplish the dialogue task, the model learns from human
demonstration to imitate intellectual activities such as persuasion, and
selects the most persuasive responses. Experiments show that our model
outperforms previous state-of-the-art dialogue models on both automatic metrics
and human evaluation results on a donation persuasion task, and generates more
diverse, consistent and persuasive conversations according to the user
feedback.
Related papers
- Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Controllable Mixed-Initiative Dialogue Generation through Prompting [50.03458333265885]
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control.
Agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner.
Standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents.
We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation.
arXiv Detail & Related papers (2023-05-06T23:11:25Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Improving a sequence-to-sequence nlp model using a reinforcement
learning policy algorithm [0.0]
Current neural network models of dialogue generation show great promise for generating answers for chatty agents.
But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes.
This work commemorates a preliminary step toward developing a neural conversational model based on the long-term success of dialogues.
arXiv Detail & Related papers (2022-12-28T22:46:57Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - Learning from Perturbations: Diverse and Informative Dialogue Generation
with Inverse Adversarial Training [10.17868476063421]
We propose Inverse Adversarial Training (IAT) algorithm for training neural dialogue systems.
IAT encourages the model to be sensitive to the perturbation in the dialogue history and therefore learning from perturbations.
We show that our approach can better model dialogue history and generate more diverse and consistent responses.
arXiv Detail & Related papers (2021-05-31T17:28:37Z) - DialogBERT: Discourse-Aware Response Generation via Learning to Recover
and Rank Utterances [18.199473005335093]
This paper presents DialogBERT, a novel conversational response generation model that enhances previous PLM-based dialogue models.
To efficiently capture the discourse-level coherence among utterances, we propose two training objectives, including masked utterance regression.
Experiments on three multi-turn conversation datasets show that our approach remarkably outperforms the baselines.
arXiv Detail & Related papers (2020-12-03T09:06:23Z) - Group-wise Contrastive Learning for Neural Dialogue Generation [29.749195182401344]
We introduce contrastive learning into dialogue generation, where the model explicitly perceives the difference between the well-chosen positive and negative utterances.
To manage the multi-mapping relations prevailed in human conversation, we augment contrastive dialogue learning with group-wise dual sampling.
arXiv Detail & Related papers (2020-09-16T08:28:30Z) - Ranking Enhanced Dialogue Generation [77.8321855074999]
How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation.
Previous works usually employ various neural network architectures to model the history.
This paper proposes a Ranking Enhanced Dialogue generation framework.
arXiv Detail & Related papers (2020-08-13T01:49:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.