Related papers: RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts

RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts

URL: http://arxiv.org/abs/2501.17715v1
Date: Wed, 29 Jan 2025 15:32:27 GMT
Title: RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts
Authors: Eujeong Choi, Younghun Jeong, Soomin Kim, Won Ik Cho,
Abstract summary: RICoTA is a Korean red teaming dataset that consists of 609 prompts challenging large language models (LLMs)<n>We utilize user-chatbot conversations that were self-posted on a Korean Reddit-like community.<n>Our dataset will be made publicly available via GitHub.
Score: 6.0385743836962025
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: User interactions with conversational agents (CAs) evolve in the era of heavily guardrailed large language models (LLMs). As users push beyond programmed boundaries to explore and build relationships with these systems, there is a growing concern regarding the potential for unauthorized access or manipulation, commonly referred to as "jailbreaking." Moreover, with CAs that possess highly human-like qualities, users show a tendency toward initiating intimate sexual interactions or attempting to tame their chatbots. To capture and reflect these in-the-wild interactions into chatbot designs, we propose RICoTA, a Korean red teaming dataset that consists of 609 prompts challenging LLMs with in-the-wild user-made dialogues capturing jailbreak attempts. We utilize user-chatbot conversations that were self-posted on a Korean Reddit-like community, containing specific testing and gaming intentions with a social chatbot. With these prompts, we aim to evaluate LLMs' ability to identify the type of conversation and users' testing purposes to derive chatbot design implications for mitigating jailbreaking risks. Our dataset will be made publicly available via GitHub.

Related papers

SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its Usefulness [4.896226014796392]
We introduce SafeChat, a general architecture for building safe and trustworthy chatbots. Key features of SafeChat include: (a) safety, with a domain-agnostic design where responses are grounded and traceable to approved sources (provenance); (b) usability, with automatic extractive summarization of long responses, traceable to their sources; and (c) fast, scalable development, including a CSV-driven workflow, automated testing, and integration with various devices.
arXiv Detail & Related papers (2025-04-08T19:16:43Z)
LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z)
WildChat: 1M ChatGPT Interaction Logs in the Wild [88.05964311416717]
WildChat is a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns. In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses.
arXiv Detail & Related papers (2024-05-02T17:00:02Z)
Evaluating Chatbots to Promote Users' Trust -- Practices and Open Problems [11.427175278545517]
This paper reviews current practices for testing chatbots. It identifies gaps as open problems in pursuit of user trust. It outlines a path forward to mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society.
arXiv Detail & Related papers (2023-09-09T22:40:30Z)
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots [8.763670548363443]
A new attack, toxicbot, is developed to generate toxic responses in a multi-turn conversation. toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue.
arXiv Detail & Related papers (2023-07-14T03:58:42Z)
Rewarding Chatbots for Real-World Engagement with Millions of Users [1.2583983802175422]
This work investigates the development of social chatbots that prioritize user engagement to enhance retention. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses. A/B testing on groups of 10,000 new dailychat users on the Chai Research platform shows that this approach increases the MCL by up to 70%. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
arXiv Detail & Related papers (2023-03-10T18:53:52Z)
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data [15.808841433843742]
Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably.
arXiv Detail & Related papers (2023-01-14T07:29:36Z)
Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent [65.68144111226626]
Chirpy Cardinal aims to be both informative and conversational. We let both the user and bot take turns driving the conversation. Chirpy Cardinal placed second out of nine bots in the Alexa Prize Socialbot Grand Challenge.
arXiv Detail & Related papers (2022-07-25T09:57:23Z)
Training Conversational Agents with Generative Conversational Networks [74.9941330874663]
We use Generative Conversational Networks to automatically generate data and train social conversational agents. We evaluate our approach on TopicalChat with automatic metrics and human evaluators, showing that with 10% of seed data it performs close to the baseline that uses 100% of the data.
arXiv Detail & Related papers (2021-10-15T21:46:39Z)
Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions. Our framework included a guiding robot and an interlocutor model that plays the role of humans. We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z)
CASS: Towards Building a Social-Support Chatbot for Online Health Community [67.45813419121603]
The CASS architecture is based on advanced neural network algorithms. It can handle new inputs from users and generate a variety of responses to them. With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support.
arXiv Detail & Related papers (2021-01-04T05:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.