SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety
Failures
- URL: http://arxiv.org/abs/2110.07518v1
- Date: Thu, 14 Oct 2021 16:41:25 GMT
- Title: SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety
Failures
- Authors: Megan Ung, Jing Xu, Y-Lan Boureau
- Abstract summary: This work proposes SaFeRDialogues, a task and dataset of graceful responses to feedback about safety failures.
We collect a dataset of 10k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback.
We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation.
- Score: 9.38317687250036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current open-domain conversational models can easily be made to talk in
inadequate ways. Online learning from conversational feedback given by the
conversation partner is a promising avenue for a model to improve and adapt, so
as to generate fewer of these safety failures. However, current
state-of-the-art models tend to react to feedback with defensive or oblivious
responses. This makes for an unpleasant experience and may discourage
conversation partners from giving feedback in the future. This work proposes
SaFeRDialogues, a task and dataset of graceful responses to conversational
feedback about safety failures. We collect a dataset of 10k dialogues
demonstrating safety failures, feedback signaling them, and a response
acknowledging the feedback. We show how fine-tuning on this dataset results in
conversations that human raters deem considerably more likely to lead to a
civil conversation, without sacrificing engagingness or general conversational
ability.
Related papers
- Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z) - WHAT, WHEN, and HOW to Ground: Designing User Persona-Aware
Conversational Agents for Engaging Dialogue [4.328280329592151]
We present a method for building a personalized open-domain dialogue system to address the WWH problem for natural response generation in a commercial setting.
The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets.
Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses.
arXiv Detail & Related papers (2023-06-06T02:28:38Z) - SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
Responses Created Through Human-Machine Collaboration [75.62448812759968]
This dataset is a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses.
The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines.
arXiv Detail & Related papers (2023-05-28T11:51:20Z) - Boosting Distress Support Dialogue Responses with Motivational
Interviewing Strategy [4.264192013842096]
We show how some response types could be rephrased into a more MI adherent form.
We build several rephrasers by fine-tuning Blender and GPT3 to rephrase MI non-adherent "Advise without permission" responses into "Advise with permission"
arXiv Detail & Related papers (2023-05-17T13:18:28Z) - Conversation Modeling to Predict Derailment [15.45515784064555]
The ability to predict whether ongoing conversations are likely to derail could provide valuable real-time insight to interlocutors and moderators.
Some works attempt to make dynamic prediction as the conversation develops, but fail to incorporate multisource information, such as conversation structure and distance to derailment.
We propose a hierarchical transformer-based framework that combines utterance-level and conversation-level information to capture fine-grained contextual semantics.
arXiv Detail & Related papers (2023-03-20T15:10:45Z) - Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots.
It uses in-context learning to steer a model towards safer generations.
We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z) - AutoReply: Detecting Nonsense in Dialogue Introspectively with
Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages.
We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy.
We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z) - ProsocialDialog: A Prosocial Backbone for Conversational Agents [104.92776607564583]
We introduce ProsocialDialog, the first large-scale dialogue dataset to teach conversational agents to respond to problematic content following social norms.
Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K RoTs, and 497K dialogue safety labels.
With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost.
arXiv Detail & Related papers (2022-05-25T11:48:47Z) - Counterfactual Off-Policy Training for Neural Response Generation [94.76649147381232]
We propose to explore potential responses by counterfactual reasoning.
Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space.
An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
arXiv Detail & Related papers (2020-04-29T22:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.