A Systematic Evaluation of Response Selection for Open Domain Dialogue
- URL: http://arxiv.org/abs/2208.04379v1
- Date: Mon, 8 Aug 2022 19:33:30 GMT
- Title: A Systematic Evaluation of Response Selection for Open Domain Dialogue
- Authors: Behnam Hedayatnia, Di Jin, Yang Liu, Dilek Hakkani-Tur
- Abstract summary: We curated a dataset where responses from multiple response generators produced for the same dialog context are manually annotated as appropriate (positive) and inappropriate (negative)
We conduct a systematic evaluation of state-of-the-art methods for response selection, and demonstrate that both strategies of using multiple positive candidates and using manually verified hard negative candidates can bring in significant performance improvement in comparison to using the adversarial training data, e.g., increase of 3% and 13% in Recall@1 score, respectively.
- Score: 36.88551817451512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress on neural approaches for language processing has triggered a
resurgence of interest on building intelligent open-domain chatbots. However,
even the state-of-the-art neural chatbots cannot produce satisfying responses
for every turn in a dialog. A practical solution is to generate multiple
response candidates for the same context, and then perform response
ranking/selection to determine which candidate is the best. Previous work in
response selection typically trains response rankers using synthetic data that
is formed from existing dialogs by using a ground truth response as the single
appropriate response and constructing inappropriate responses via random
selection or using adversarial methods. In this work, we curated a dataset
where responses from multiple response generators produced for the same dialog
context are manually annotated as appropriate (positive) and inappropriate
(negative). We argue that such training data better matches the actual use case
examples, enabling the models to learn to rank responses effectively. With this
new dataset, we conduct a systematic evaluation of state-of-the-art methods for
response selection, and demonstrate that both strategies of using multiple
positive candidates and using manually verified hard negative candidates can
bring in significant performance improvement in comparison to using the
adversarial training data, e.g., increase of 3% and 13% in Recall@1 score,
respectively.
Related papers
- PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue
Systems [71.33737787564966]
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap'
We propose a reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system.
Our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results.
arXiv Detail & Related papers (2022-11-07T15:59:49Z) - Pneg: Prompt-based Negative Response Generation for Dialogue Response
Selection Task [27.513992470527427]
In retrieval-based dialogue systems, a response selection model acts as a ranker to select the most appropriate response among several candidates.
Recent studies have shown that leveraging adversarial responses as negative training samples is useful for improving the discriminating power of the selection model.
This paper proposes a simple but efficient method for generating adversarial negative responses leveraging a large-scale language model.
arXiv Detail & Related papers (2022-10-31T11:49:49Z) - Sparse and Dense Approaches for the Full-rank Retrieval of Responses for
Dialogues [11.726528038065764]
We focus on the more realistic task of full-rank retrieval of responses, where $n$ can be up to millions of responses.
Our findings based on three different information-seeking dialogue datasets reveal that a learned response expansion technique is a solid baseline for sparse retrieval.
We find the best performing method overall to be dense retrieval with intermediate training, followed by fine-tuning on the target conversational data.
arXiv Detail & Related papers (2022-04-22T08:15:15Z) - Saying No is An Art: Contextualized Fallback Responses for Unanswerable
Dialogue Queries [3.593955557310285]
Most dialogue systems rely on hybrid approaches for generating a set of ranked responses.
We design a neural approach which generates responses which are contextually aware with the user query.
Our simple approach makes use of rules over dependency parses and a text-to-text transformer fine-tuned on synthetic data of question-response pairs.
arXiv Detail & Related papers (2020-12-03T12:34:22Z) - Generating Dialogue Responses from a Semantic Latent Space [75.18449428414736]
We propose an alternative to the end-to-end classification on vocabulary.
We learn the pair relationship between the prompts and responses as a regression task on a latent space.
Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
arXiv Detail & Related papers (2020-10-04T19:06:16Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Counterfactual Off-Policy Training for Neural Response Generation [94.76649147381232]
We propose to explore potential responses by counterfactual reasoning.
Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space.
An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
arXiv Detail & Related papers (2020-04-29T22:46:28Z) - Evaluating Dialogue Generation Systems via Response Selection [42.56640173047927]
We propose a method to construct response selection test sets with well-chosen false candidates.
We demonstrate that evaluating systems via response selection with the test sets developed by our method correlates more strongly with human evaluation.
arXiv Detail & Related papers (2020-04-29T16:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.