ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in
Real-World User-AI Conversation
- URL: http://arxiv.org/abs/2310.17389v1
- Date: Thu, 26 Oct 2023 13:35:41 GMT
- Title: ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in
Real-World User-AI Conversation
- Authors: Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang,
Jingbo Shang
- Abstract summary: We introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbots.
Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat.
In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.
- Score: 43.356758428820626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite remarkable advances that large language models have achieved in
chatbots, maintaining a non-toxic user-AI interactive environment has become
increasingly critical nowadays. However, previous efforts in toxicity detection
have been mostly based on benchmarks derived from social media content, leaving
the unique challenges inherent to real-world user-AI interactions
insufficiently explored. In this work, we introduce ToxicChat, a novel
benchmark based on real user queries from an open-source chatbot. This
benchmark contains the rich, nuanced phenomena that can be tricky for current
toxicity detection models to identify, revealing a significant domain
difference compared to social media content. Our systematic evaluation of
models trained on existing toxicity datasets has shown their shortcomings when
applied to this unique domain of ToxicChat. Our work illuminates the
potentially overlooked challenges of toxicity detection in real-world user-AI
conversations. In the future, ToxicChat can be a valuable resource to drive
further advancements toward building a safe and healthy environment for user-AI
interactions.
Related papers
- Exploring ChatGPT for Toxicity Detection in GitHub [5.003898791753481]
The prevalence of negative discourse, often manifested as toxic comments, poses significant challenges to developer well-being and productivity.
To identify such negativity in project communications, automated toxicity detection models are necessary.
To train these models effectively, we need large software engineering-specific toxicity datasets.
arXiv Detail & Related papers (2023-12-20T15:23:00Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - Revisiting Contextual Toxicity Detection in Conversations [28.465019968374413]
We show that toxicity labelling by humans is in general influenced by the conversational structure, polarity and topic of the context.
We propose to bring these findings into computational detection models by introducing (a) neural architectures for contextual toxicity detection.
We have also demonstrated that such models can benefit from synthetic data, especially in the social media domain.
arXiv Detail & Related papers (2021-11-24T11:50:37Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Automatic Evaluation and Moderation of Open-domain Dialogue Systems [59.305712262126264]
A long standing challenge that bothers the researchers is the lack of effective automatic evaluation metrics.
This paper describes the data, baselines and results obtained for the Track 5 at the Dialogue System Technology Challenge 10 (DSTC10)
arXiv Detail & Related papers (2021-11-03T10:08:05Z) - RECAST: Enabling User Recourse and Interpretability of Toxicity
Detection Models with Interactive Visualization [16.35961310670002]
We present our work, RECAST, an interactive, open-sourced web tool for visualizing toxic models' predictions.
We found that RECAST was highly effective at helping users reduce toxicity as detected through the model.
This opens a discussion for how toxicity detection models work and should work, and their effect on the future of online discourse.
arXiv Detail & Related papers (2021-02-08T18:37:50Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z) - RECAST: Interactive Auditing of Automatic Toxicity Detection Models [39.621867230707814]
We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.
arXiv Detail & Related papers (2020-01-07T00:17:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.