Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
- URL: http://arxiv.org/abs/2307.09579v1
- Date: Fri, 14 Jul 2023 03:58:42 GMT
- Title: Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
- Authors: Bocheng Chen, Guangjing Wang, Hanqing Guo, Yuanda Wang, Qiben Yan
- Abstract summary: A new attack, toxicbot, is developed to generate toxic responses in a multi-turn conversation.
toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue.
- Score: 8.763670548363443
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in natural language processing and machine learning have led
to the development of chatbot models, such as ChatGPT, that can engage in
conversational dialogue with human users. However, the ability of these models
to generate toxic or harmful responses during a non-toxic multi-turn
conversation remains an open research question. Existing research focuses on
single-turn sentence testing, while we find that 82\% of the individual
non-toxic sentences that elicit toxic behaviors in a conversation are
considered safe by existing tools. In this paper, we design a new attack,
\toxicbot, by fine-tuning a chatbot to engage in conversation with a target
open-domain chatbot. The chatbot is fine-tuned with a collection of crafted
conversation sequences. Particularly, each conversation begins with a sentence
from a crafted prompt sentences dataset. Our extensive evaluation shows that
open-domain chatbot models can be triggered to generate toxic responses in a
multi-turn conversation. In the best scenario, \toxicbot achieves a 67\%
activation rate. The conversation sequences in the fine-tuning stage help
trigger the toxicity in a conversation, which allows the attack to bypass two
defense methods. Our findings suggest that further research is needed to
address chatbot toxicity in a dynamic interactive environment. The proposed
\toxicbot can be used by both industry and researchers to develop methods for
detecting and mitigating toxic responses in conversational dialogue and improve
the robustness of chatbots for end users.
Related papers
- LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain
Chatbots [24.84440998820146]
This paper presents a first-of-its-kind, large-scale measurement of toxicity in chatbots.
We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries.
We then set out to design and experiment with an attack, ToxicBuddy, which relies on fine-tuning GPT-2 to generate non-toxic queries.
arXiv Detail & Related papers (2022-09-07T20:45:41Z) - Neural Generation Meets Real People: Building a Social, Informative
Open-Domain Dialogue Agent [65.68144111226626]
Chirpy Cardinal aims to be both informative and conversational.
We let both the user and bot take turns driving the conversation.
Chirpy Cardinal placed second out of nine bots in the Alexa Prize Socialbot Grand Challenge.
arXiv Detail & Related papers (2022-07-25T09:57:23Z) - Evaluator for Emotionally Consistent Chatbots [2.8348950186890467]
The most recent work only evaluates on the aspects of context coherence, language fluency, response diversity, or logical self-consistency between dialogues.
This work proposes training an evaluator to determine the emotional consistency of chatbots.
arXiv Detail & Related papers (2021-12-02T21:47:29Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Addressing Inquiries about History: An Efficient and Practical Framework
for Evaluating Open-domain Chatbot Consistency [28.255324166852535]
We propose the Addressing Inquiries about History (AIH) framework for the consistency evaluation.
At the conversation stage, AIH attempts to address appropriate inquiries about the dialogue history to induce the chatbots to redeclare the historical facts or opinions.
At the contradiction recognition stage, we can either employ human judges or a natural language inference (NLI) model to recognize whether the answers to the inquiries are contradictory with history.
arXiv Detail & Related papers (2021-06-04T03:04:13Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - Spot The Bot: A Robust and Efficient Framework for the Evaluation of
Conversational Dialogue Systems [21.36935947626793]
emphSpot The Bot replaces human-bot conversations with conversations between bots.
Human judges only annotate for each entity in a conversation whether they think it is human or not.
emphSurvival Analysis measures which bot can uphold human-like behavior the longest.
arXiv Detail & Related papers (2020-10-05T16:37:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.