Synthesizing Adversarial Negative Responses for Robust Response Ranking
and Evaluation
- URL: http://arxiv.org/abs/2106.05894v1
- Date: Thu, 10 Jun 2021 16:20:55 GMT
- Title: Synthesizing Adversarial Negative Responses for Robust Response Ranking
and Evaluation
- Authors: Prakhar Gupta, Yulia Tsvetkov, Jeffrey P. Bigham
- Abstract summary: Open-domain neural dialogue models have achieved high performance in response ranking and evaluation tasks.
Over-reliance on content similarity makes the models less sensitive to the presence of inconsistencies.
We propose approaches for automatically creating adversarial negative training data.
- Score: 34.52276336319678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-domain neural dialogue models have achieved high performance in response
ranking and evaluation tasks. These tasks are formulated as a binary
classification of responses given in a dialogue context, and models generally
learn to make predictions based on context-response content similarity.
However, over-reliance on content similarity makes the models less sensitive to
the presence of inconsistencies, incorrect time expressions and other factors
important for response appropriateness and coherence. We propose approaches for
automatically creating adversarial negative training data to help ranking and
evaluation models learn features beyond content similarity. We propose
mask-and-fill and keyword-guided approaches that generate negative examples for
training more robust dialogue systems. These generated adversarial responses
have high content similarity with the contexts but are either incoherent,
inappropriate or not fluent. Our approaches are fully data-driven and can be
easily incorporated in existing models and datasets. Experiments on
classification, ranking and evaluation tasks across multiple datasets
demonstrate that our approaches outperform strong baselines in providing
informative negative examples for training dialogue systems.
Related papers
- Emphasising Structured Information: Integrating Abstract Meaning Representation into LLMs for Enhanced Open-Domain Dialogue Evaluation [26.330012489735456]
This paper proposes an effective framework for open-domain dialogue evaluation.
It combines domain-specific language models (SLMs) enhanced with Abstract Meaning Representation (AMR) knowledge with Large Language Models (LLMs)
Experimental results on open-domain dialogue evaluation tasks demonstrate the superiority of our method compared to a wide range of state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-01T14:11:45Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - DialAug: Mixing up Dialogue Contexts in Contrastive Learning for Robust
Conversational Modeling [3.3578533367912025]
We propose a framework that incorporates augmented versions of a dialogue context into the learning objective.
We show that our proposed augmentation method outperforms previous data augmentation approaches.
arXiv Detail & Related papers (2022-04-15T23:39:41Z) - DEAM: Dialogue Coherence Evaluation using AMR-based Semantic
Manipulations [46.942369532632604]
We propose a Dialogue Evaluation metric that relies on AMR-based semantic manipulations for incoherent data generation.
Our experiments show that DEAM achieves higher correlations with human judgments compared to baseline methods.
arXiv Detail & Related papers (2022-03-18T03:11:35Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation
for Multi-turn Dialogue [17.663449579168297]
We simulate a dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other.
The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses.
Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation.
arXiv Detail & Related papers (2021-08-01T08:00:45Z) - Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Speaker Sensitive Response Evaluation Model [17.381658875470638]
We propose an automatic evaluation model based on the similarity of the generated response with the conversational context.
We learn the model parameters from an unlabeled conversation corpus.
We show that our model can be applied to movie dialogues without any additional training.
arXiv Detail & Related papers (2020-06-12T08:59:10Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.