On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
- URL: http://arxiv.org/abs/2110.08466v1
- Date: Sat, 16 Oct 2021 04:17:12 GMT
- Title: On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
- Authors: Hao Sun, Guangxuan Xu, Jiawen Deng, Jiale Cheng, Chujie Zheng, Hao
Zhou, Nanyun Peng, Xiaoyan Zhu, Minlie Huang
- Abstract summary: We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting.
We compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples.
Experiments show that existing utterance-level safety tools guarding fail catastrophically on our dataset.
- Score: 42.322782754346406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue safety problems severely limit the real-world deployment of neural
conversational models and attract great research interests recently. We propose
a taxonomy for dialogue safety specifically designed to capture unsafe
behaviors that are unique in human-bot dialogue setting, with focuses on
context-sensitive unsafety, which is under-explored in prior works. To spur
research in this direction, we compile DiaSafety, a dataset of 6 unsafe
categories with rich context-sensitive unsafe examples. Experiments show that
existing utterance-level safety guarding tools fail catastrophically on our
dataset. As a remedy, we train a context-level dialogue safety classifier to
provide a strong baseline for context-sensitive dialogue unsafety detection.
With our classifier, we perform safety evaluations on popular conversational
models and show that existing dialogue systems are still stuck in
context-sensitive safety problems.
Related papers
- Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z) - Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation [26.443929802292807]
CensorChat is a dialogue monitoring dataset aimed at NSFW dialogue detection.
This dataset offers a cost-effective means of constructing NSFW content detectors.
The proposed approach not only advances NSFW content detection but also aligns with evolving user protection needs in AI-driven dialogues.
arXiv Detail & Related papers (2023-09-18T13:24:44Z) - A Benchmark for Understanding Dialogue Safety in Mental Health Support [15.22008156903607]
This paper aims to develop a theoretically and factually grounded taxonomy that prioritizes the positive impact on help-seekers.
We analyze the dataset using popular language models, including BERT-base, RoBERTa-large, and ChatGPT.
The developed dataset and findings serve as valuable benchmarks for advancing research on dialogue safety in mental health support.
arXiv Detail & Related papers (2023-07-31T07:33:16Z) - Healing Unsafe Dialogue Responses with Weak Supervision Signals [24.749797310489253]
Unsupervised pseudo-label sampling method, TEMP, can automatically assign potential safe responses.
Our TEMP method group responses into several clusters and samples multiple labels with an adaptively sharpened sampling strategy.
Experiments in chitchat and task-oriented dialogues show that our TEMP outperforms state-of-the-art models with weak supervision signals.
arXiv Detail & Related papers (2023-05-25T06:15:53Z) - Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots.
It uses in-context learning to steer a model towards safer generations.
We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z) - SafeText: A Benchmark for Exploring Physical Safety in Language Models [62.810902375154136]
We study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks.
We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice.
arXiv Detail & Related papers (2022-10-18T17:59:31Z) - Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and
Benchmarks [95.29345070102045]
In this paper, we focus our investigation on social bias detection of dialog safety problems.
We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically.
We introduce CDail-Bias dataset that is the first well-annotated Chinese social bias dialog dataset.
arXiv Detail & Related papers (2022-02-16T11:59:29Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.