A Benchmark for Understanding Dialogue Safety in Mental Health Support
- URL: http://arxiv.org/abs/2307.16457v1
- Date: Mon, 31 Jul 2023 07:33:16 GMT
- Title: A Benchmark for Understanding Dialogue Safety in Mental Health Support
- Authors: Huachuan Qiu, Tong Zhao, Anqi Li, Shuai Zhang, Hongliang He, Zhenzhong
Lan
- Abstract summary: This paper aims to develop a theoretically and factually grounded taxonomy that prioritizes the positive impact on help-seekers.
We analyze the dataset using popular language models, including BERT-base, RoBERTa-large, and ChatGPT.
The developed dataset and findings serve as valuable benchmarks for advancing research on dialogue safety in mental health support.
- Score: 15.22008156903607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue safety remains a pervasive challenge in open-domain human-machine
interaction. Existing approaches propose distinctive dialogue safety taxonomies
and datasets for detecting explicitly harmful responses. However, these
taxonomies may not be suitable for analyzing response safety in mental health
support. In real-world interactions, a model response deemed acceptable in
casual conversations might have a negligible positive impact on users seeking
mental health support. To address these limitations, this paper aims to develop
a theoretically and factually grounded taxonomy that prioritizes the positive
impact on help-seekers. Additionally, we create a benchmark corpus with
fine-grained labels for each dialogue session to facilitate further research.
We analyze the dataset using popular language models, including BERT-base,
RoBERTa-large, and ChatGPT, to detect and understand unsafe responses within
the context of mental health support. Our study reveals that ChatGPT struggles
to detect safety categories with detailed safety definitions in a zero- and
few-shot paradigm, whereas the fine-tuned model proves to be more suitable. The
developed dataset and findings serve as valuable benchmarks for advancing
research on dialogue safety in mental health support, with significant
implications for improving the design and deployment of conversation agents in
real-world applications. We release our code and data here:
https://github.com/qiuhuachuan/DialogueSafety.
Related papers
- Improving Dialog Safety using Socially Aware Contrastive Learning [8.503001932363704]
We study prosociality in both adversarial and casual dialog contexts.
We propose a dual-step fine-tuning process to address these issues.
We train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog.
arXiv Detail & Related papers (2024-02-01T09:24:33Z) - Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation [26.443929802292807]
CensorChat is a dialogue monitoring dataset aimed at NSFW dialogue detection.
This dataset offers a cost-effective means of constructing NSFW content detectors.
The proposed approach not only advances NSFW content detection but also aligns with evolving user protection needs in AI-driven dialogues.
arXiv Detail & Related papers (2023-09-18T13:24:44Z) - SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
Responses Created Through Human-Machine Collaboration [75.62448812759968]
This dataset is a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses.
The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines.
arXiv Detail & Related papers (2023-05-28T11:51:20Z) - Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots.
It uses in-context learning to steer a model towards safer generations.
We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z) - Response-act Guided Reinforced Dialogue Generation for Mental Health
Counseling [25.524804770124145]
We present READER, a dialogue-act guided response generator for mental health counseling conversations.
READER is built on transformer to jointly predict a potential dialogue-act d(t+1) for the next utterance (aka response-act) and to generate an appropriate response u(t+1)
We evaluate READER on HOPE, a benchmark counseling conversation dataset.
arXiv Detail & Related papers (2023-01-30T08:53:35Z) - Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and
Benchmarks [95.29345070102045]
In this paper, we focus our investigation on social bias detection of dialog safety problems.
We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically.
We introduce CDail-Bias dataset that is the first well-annotated Chinese social bias dialog dataset.
arXiv Detail & Related papers (2022-02-16T11:59:29Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z) - On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark [42.322782754346406]
We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting.
We compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples.
Experiments show that existing utterance-level safety tools guarding fail catastrophically on our dataset.
arXiv Detail & Related papers (2021-10-16T04:17:12Z) - Counterfactual Off-Policy Training for Neural Response Generation [94.76649147381232]
We propose to explore potential responses by counterfactual reasoning.
Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space.
An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
arXiv Detail & Related papers (2020-04-29T22:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.