Counterfactual Off-Policy Training for Neural Response Generation
- URL: http://arxiv.org/abs/2004.14507v2
- Date: Fri, 9 Oct 2020 07:47:45 GMT
- Title: Counterfactual Off-Policy Training for Neural Response Generation
- Authors: Qingfu Zhu, Weinan Zhang, Ting Liu, William Yang Wang
- Abstract summary: We propose to explore potential responses by counterfactual reasoning.
Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space.
An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
- Score: 94.76649147381232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain dialogue generation suffers from the data insufficiency problem
due to the vast size of potential responses. In this paper, we propose to
explore potential responses by counterfactual reasoning. Given an observed
response, the counterfactual reasoning model automatically infers the outcome
of an alternative policy that could have been taken. The resulting
counterfactual response synthesized in hindsight is of higher quality than the
response synthesized from scratch. Training on the counterfactual responses
under the adversarial learning framework helps to explore the high-reward area
of the potential response space. An empirical study on the DailyDialog dataset
shows that our approach significantly outperforms the HRED model as well as the
conventional adversarial learning approaches.
Related papers
- PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Promoting Open-domain Dialogue Generation through Learning Pattern
Information between Contexts and Responses [5.936682548344234]
This paper improves the quality of generated responses by learning the implicit pattern information between contexts and responses in the training samples.
We also design a response-aware mechanism for mining the implicit pattern information between contexts and responses so that the generated replies are more diverse and approximate to human replies.
arXiv Detail & Related papers (2023-09-06T08:11:39Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Pneg: Prompt-based Negative Response Generation for Dialogue Response
Selection Task [27.513992470527427]
In retrieval-based dialogue systems, a response selection model acts as a ranker to select the most appropriate response among several candidates.
Recent studies have shown that leveraging adversarial responses as negative training samples is useful for improving the discriminating power of the selection model.
This paper proposes a simple but efficient method for generating adversarial negative responses leveraging a large-scale language model.
arXiv Detail & Related papers (2022-10-31T11:49:49Z) - A Systematic Evaluation of Response Selection for Open Domain Dialogue [36.88551817451512]
We curated a dataset where responses from multiple response generators produced for the same dialog context are manually annotated as appropriate (positive) and inappropriate (negative)
We conduct a systematic evaluation of state-of-the-art methods for response selection, and demonstrate that both strategies of using multiple positive candidates and using manually verified hard negative candidates can bring in significant performance improvement in comparison to using the adversarial training data, e.g., increase of 3% and 13% in Recall@1 score, respectively.
arXiv Detail & Related papers (2022-08-08T19:33:30Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Learning from Perturbations: Diverse and Informative Dialogue Generation
with Inverse Adversarial Training [10.17868476063421]
We propose Inverse Adversarial Training (IAT) algorithm for training neural dialogue systems.
IAT encourages the model to be sensitive to the perturbation in the dialogue history and therefore learning from perturbations.
We show that our approach can better model dialogue history and generate more diverse and consistent responses.
arXiv Detail & Related papers (2021-05-31T17:28:37Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - A Controllable Model of Grounded Response Generation [122.7121624884747]
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process.
We propose a framework that we call controllable grounded response generation (CGRG)
We show that using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.
arXiv Detail & Related papers (2020-05-01T21:22:08Z) - Posterior-GAN: Towards Informative and Coherent Response Generation with
Posterior Generative Adversarial Network [38.576579498740244]
We propose a novel encoder-decoder based generative adversarial learning framework, Posterior Generative Adversarial Network (Posterior-GAN)
Experimental results demonstrate that our method effectively boosts the informativeness and coherence of the generated response on both automatic and human evaluation.
arXiv Detail & Related papers (2020-03-04T11:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.