Recipes for Safety in Open-domain Chatbots
- URL: http://arxiv.org/abs/2010.07079v3
- Date: Wed, 4 Aug 2021 21:33:39 GMT
- Title: Recipes for Safety in Open-domain Chatbots
- Authors: Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, Emily Dinan
- Abstract summary: We introduce a new human-and-model-in-the-loop framework for both training safer models and for evaluating them.
We conduct experiments comparing these methods and find our new techniques are (i) safer than existing models as measured by automatic and human evaluations.
We then discuss the limitations of this work by analyzing failure cases of our models.
- Score: 32.31067267979087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models trained on large unlabeled corpora of human interactions will learn
patterns and mimic behaviors therein, which include offensive or otherwise
toxic behavior and unwanted biases. We investigate a variety of methods to
mitigate these issues in the context of open-domain generative dialogue models.
We introduce a new human-and-model-in-the-loop framework for both training
safer models and for evaluating them, as well as a novel method to distill
safety considerations inside generative models without the use of an external
classifier at deployment time. We conduct experiments comparing these methods
and find our new techniques are (i) safer than existing models as measured by
automatic and human evaluations while (ii) maintaining usability metrics such
as engagingness relative to the state of the art. We then discuss the
limitations of this work by analyzing failure cases of our models.
Related papers
- Science based AI model certification for new operational environments with application in traffic state estimation [1.2186759689780324]
The expanding role of Artificial Intelligence (AI) in diverse engineering domains highlights the challenges associated with deploying AI models in new operational environments.
This paper proposes a science-based certification methodology to assess the viability of employing pre-trained data-driven models in new operational environments.
arXiv Detail & Related papers (2024-05-13T16:28:00Z) - Operationalizing Specifications, In Addition to Test Sets for Evaluating
Constrained Generative Models [17.914521288548844]
We argue that the scale of generative models could be exploited to raise the abstraction level at which evaluation itself is conducted.
Our recommendations are based on leveraging specifications as a powerful instrument to evaluate generation quality.
arXiv Detail & Related papers (2022-11-19T06:39:43Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Is Automated Topic Model Evaluation Broken?: The Incoherence of
Coherence [62.826466543958624]
We look at the standardization gap and the validation gap in topic model evaluation.
Recent models relying on neural components surpass classical topic models according to these metrics.
We use automatic coherence along with the two most widely accepted human judgment tasks, namely, topic rating and word intrusion.
arXiv Detail & Related papers (2021-07-05T17:58:52Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods.
Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.