Related papers: Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

URL: http://arxiv.org/abs/2307.09579v1
Date: Fri, 14 Jul 2023 03:58:42 GMT
Title: Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Authors: Bocheng Chen, Guangjing Wang, Hanqing Guo, Yuanda Wang, Qiben Yan
Abstract summary: A new attack, toxicbot, is developed to generate toxic responses in a multi-turn conversation. toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue.
Score: 8.763670548363443
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research question. Existing research focuses on single-turn sentence testing, while we find that 82\% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, \toxicbot, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, \toxicbot achieves a 67\% activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed \toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.

Related papers

LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z)
Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets. prompts in creative writing tasks can be 2x more likely to elicit toxic responses. Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z)
PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z)
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots [24.84440998820146]
This paper presents a first-of-its-kind, large-scale measurement of toxicity in chatbots. We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries. We then set out to design and experiment with an attack, ToxicBuddy, which relies on fine-tuning GPT-2 to generate non-toxic queries.
arXiv Detail & Related papers (2022-09-07T20:45:41Z)
Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent [65.68144111226626]
Chirpy Cardinal aims to be both informative and conversational. We let both the user and bot take turns driving the conversation. Chirpy Cardinal placed second out of nine bots in the Alexa Prize Socialbot Grand Challenge.
arXiv Detail & Related papers (2022-07-25T09:57:23Z)
Evaluator for Emotionally Consistent Chatbots [2.8348950186890467]
The most recent work only evaluates on the aspects of context coherence, language fluency, response diversity, or logical self-consistency between dialogues. This work proposes training an evaluator to determine the emotional consistency of chatbots.
arXiv Detail & Related papers (2021-12-02T21:47:29Z)
CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns. We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning. To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z)
Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain Chatbot Consistency [28.255324166852535]
We propose the Addressing Inquiries about History (AIH) framework for the consistency evaluation. At the conversation stage, AIH attempts to address appropriate inquiries about the dialogue history to induce the chatbots to redeclare the historical facts or opinions. At the contradiction recognition stage, we can either employ human judges or a natural language inference (NLI) model to recognize whether the answers to the inquiries are contradictory with history.
arXiv Detail & Related papers (2021-06-04T03:04:13Z)
Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions. Our framework included a guiding robot and an interlocutor model that plays the role of humans. We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z)
Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems [21.36935947626793]
emphSpot The Bot replaces human-bot conversations with conversations between bots. Human judges only annotate for each entity in a conversation whether they think it is human or not. emphSurvival Analysis measures which bot can uphold human-like behavior the longest.
arXiv Detail & Related papers (2020-10-05T16:37:52Z)
A Scalable Chatbot Platform Leveraging Online Community Posts: A Proof-of-Concept Study [4.623392924486831]
We verify the feasibility of constructing a data-driven chatbots with processed online community posts by using them as pseudo-conversational data. We argue that chatbots for various purposes can be built extensively through the pipeline exploiting the common structure of community posts.
arXiv Detail & Related papers (2020-01-10T01:45:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.