Related papers: SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures

SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures

URL: http://arxiv.org/abs/2110.07518v1
Date: Thu, 14 Oct 2021 16:41:25 GMT
Title: SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures
Authors: Megan Ung, Jing Xu, Y-Lan Boureau
Abstract summary: This work proposes SaFeRDialogues, a task and dataset of graceful responses to feedback about safety failures. We collect a dataset of 10k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback. We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation.
Score: 9.38317687250036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current open-domain conversational models can easily be made to talk in inadequate ways. Online learning from conversational feedback given by the conversation partner is a promising avenue for a model to improve and adapt, so as to generate fewer of these safety failures. However, current state-of-the-art models tend to react to feedback with defensive or oblivious responses. This makes for an unpleasant experience and may discourage conversation partners from giving feedback in the future. This work proposes SaFeRDialogues, a task and dataset of graceful responses to conversational feedback about safety failures. We collect a dataset of 10k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback. We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation, without sacrificing engagingness or general conversational ability.

Related papers

Understanding and Predicting Derailment in Toxic Conversations on GitHub [6.343946534579351]
This study aims to understand and predict conversational derailment leading to toxicity on GitHub. Based on this dataset, we identify unique characteristics of toxic conversations and derailment points. We propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation.
arXiv Detail & Related papers (2025-03-04T02:01:37Z)
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics [54.03209351287654]
We propose a novel evaluation protocol that can assess spoken dialog system's turn-taking capabilities. We present the first comprehensive user study that evaluates existing spoken dialogue systems on their ability to perform turn-taking events. We will open source our evaluation platform to promote the development of advanced conversational AI systems.
arXiv Detail & Related papers (2025-03-03T04:46:04Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts. We propose a dual-step fine-tuning process to address these issues. We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z)
WHAT, WHEN, and HOW to Ground: Designing User Persona-Aware Conversational Agents for Engaging Dialogue [4.328280329592151]
We present a method for building a personalized open-domain dialogue system to address the WWH problem for natural response generation in a commercial setting. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets. Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses.
arXiv Detail & Related papers (2023-06-06T02:28:38Z)
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration [75.62448812759968]
This dataset is a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines.
arXiv Detail & Related papers (2023-05-28T11:51:20Z)
Boosting Distress Support Dialogue Responses with Motivational Interviewing Strategy [4.264192013842096]
We show how some response types could be rephrased into a more MI adherent form. We build several rephrasers by fine-tuning Blender and GPT3 to rephrase MI non-adherent "Advise without permission" responses into "Advise with permission"
arXiv Detail & Related papers (2023-05-17T13:18:28Z)
Conversation Modeling to Predict Derailment [15.45515784064555]
The ability to predict whether ongoing conversations are likely to derail could provide valuable real-time insight to interlocutors and moderators. Some works attempt to make dynamic prediction as the conversation develops, but fail to incorporate multisource information, such as conversation structure and distance to derailment. We propose a hierarchical transformer-based framework that combines utterance-level and conversation-level information to capture fine-grained contextual semantics.
arXiv Detail & Related papers (2023-03-20T15:10:45Z)
Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots. It uses in-context learning to steer a model towards safer generations. We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z)
AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy. We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z)
ProsocialDialog: A Prosocial Backbone for Conversational Agents [104.92776607564583]
We introduce ProsocialDialog, the first large-scale dialogue dataset to teach conversational agents to respond to problematic content following social norms. Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K RoTs, and 497K dialogue safety labels. With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost.
arXiv Detail & Related papers (2022-05-25T11:48:47Z)
Counterfactual Off-Policy Training for Neural Response Generation [94.76649147381232]
We propose to explore potential responses by counterfactual reasoning. Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space. An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
arXiv Detail & Related papers (2020-04-29T22:46:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.