Improving Dialog Safety using Socially Aware Contrastive Learning
- URL: http://arxiv.org/abs/2402.00446v1
- Date: Thu, 1 Feb 2024 09:24:33 GMT
- Title: Improving Dialog Safety using Socially Aware Contrastive Learning
- Authors: Souvik Das, Rohini K. Srihari
- Abstract summary: We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
- Score: 8.503001932363704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art conversational AI systems raise concerns due to their
potential risks of generating unsafe, toxic, unethical, or dangerous content.
Previous works have developed datasets to teach conversational agents the
appropriate social paradigms to respond effectively to specifically designed
hazardous content. However, models trained on these adversarial datasets still
struggle to recognize subtle unsafe situations that appear naturally in
conversations or introduce an inappropriate response in a casual context. To
understand the extent of this problem, we study prosociality in both
adversarial and casual dialog contexts and audit the response quality of
general-purpose language models in terms of propensity to produce unsafe
content. We propose a dual-step fine-tuning process to address these issues
using a socially aware n-pair contrastive loss. Subsequently, we train a base
model that integrates prosocial behavior by leveraging datasets like Moral
Integrity Corpus (MIC) and ProsocialDialog. Experimental results on several
dialog datasets demonstrate the effectiveness of our approach in generating
socially appropriate responses.
Related papers
- Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues [66.69453609603875]
Sociocultural norms serve as guiding principles for personal conduct in social interactions.
We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs)
We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase.
arXiv Detail & Related papers (2024-10-04T00:08:46Z) - Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive
Learning [71.8876256714229]
We propose an entity-based contrastive learning framework for improving the robustness of knowledge-grounded dialogue systems.
Our method achieves new state-of-the-art performance in terms of automatic evaluation scores.
arXiv Detail & Related papers (2024-01-09T05:16:52Z) - A Benchmark for Understanding Dialogue Safety in Mental Health Support [15.22008156903607]
This paper aims to develop a theoretically and factually grounded taxonomy that prioritizes the positive impact on help-seekers.
We analyze the dataset using popular language models, including BERT-base, RoBERTa-large, and ChatGPT.
The developed dataset and findings serve as valuable benchmarks for advancing research on dialogue safety in mental health support.
arXiv Detail & Related papers (2023-07-31T07:33:16Z) - Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots.
It uses in-context learning to steer a model towards safer generations.
We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z) - ProsocialDialog: A Prosocial Backbone for Conversational Agents [104.92776607564583]
We introduce ProsocialDialog, the first large-scale dialogue dataset to teach conversational agents to respond to problematic content following social norms.
Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K RoTs, and 497K dialogue safety labels.
With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost.
arXiv Detail & Related papers (2022-05-25T11:48:47Z) - Seamlessly Integrating Factual Information and Social Content with
Persuasive Dialogue [48.75221685739286]
We present a novel modular dialogue system framework that seamlessly integrates factual information and social content into persuasive dialogue.
Our framework is generalizable to any dialogue tasks that have mixed social and task contents.
arXiv Detail & Related papers (2022-03-15T05:38:34Z) - On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark [42.322782754346406]
We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting.
We compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples.
Experiments show that existing utterance-level safety tools guarding fail catastrophically on our dataset.
arXiv Detail & Related papers (2021-10-16T04:17:12Z) - SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety
Failures [9.38317687250036]
This work proposes SaFeRDialogues, a task and dataset of graceful responses to feedback about safety failures.
We collect a dataset of 10k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback.
We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation.
arXiv Detail & Related papers (2021-10-14T16:41:25Z) - Counterfactual Off-Policy Training for Neural Response Generation [94.76649147381232]
We propose to explore potential responses by counterfactual reasoning.
Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space.
An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
arXiv Detail & Related papers (2020-04-29T22:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.