SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its Usefulness
- URL: http://arxiv.org/abs/2504.07995v2
- Date: Tue, 15 Apr 2025 14:41:45 GMT
- Title: SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its Usefulness
- Authors: Biplav Srivastava, Kausik Lakkaraju, Nitin Gupta, Vansh Nagpal, Bharath C. Muppasani, Sara E. Jones,
- Abstract summary: We introduce SafeChat, a general architecture for building safe and trustworthy chatbots.<n>Key features of SafeChat include: (a) safety, with a domain-agnostic design where responses are grounded and traceable to approved sources (provenance); (b) usability, with automatic extractive summarization of long responses, traceable to their sources; and (c) fast, scalable development, including a CSV-driven workflow, automated testing, and integration with various devices.
- Score: 4.896226014796392
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Collaborative assistants, or chatbots, are data-driven decision support systems that enable natural interaction for task completion. While they can meet critical needs in modern society, concerns about their reliability and trustworthiness persist. In particular, Large Language Model (LLM)-based chatbots like ChatGPT, Gemini, and DeepSeek are becoming more accessible. However, such chatbots have limitations, including their inability to explain response generation, the risk of generating problematic content, the lack of standardized testing for reliability, and the need for deep AI expertise and extended development times. These issues make chatbots unsuitable for trust-sensitive applications like elections or healthcare. To address these concerns, we introduce SafeChat, a general architecture for building safe and trustworthy chatbots, with a focus on information retrieval use cases. Key features of SafeChat include: (a) safety, with a domain-agnostic design where responses are grounded and traceable to approved sources (provenance), and 'do-not-respond' strategies to prevent harmful answers; (b) usability, with automatic extractive summarization of long responses, traceable to their sources, and automated trust assessments to communicate expected chatbot behavior, such as sentiment; and (c) fast, scalable development, including a CSV-driven workflow, automated testing, and integration with various devices. We implemented SafeChat in an executable framework using the open-source chatbot platform Rasa. A case study demonstrates its application in building ElectionBot-SC, a chatbot designed to safely disseminate official election information. SafeChat is being used in many domains, validating its potential, and is available at: https://github.com/ai4society/trustworthy-chatbot.
Related papers
- RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts [6.0385743836962025]
RICoTA is a Korean red teaming dataset that consists of 609 prompts challenging large language models (LLMs)<n>We utilize user-chatbot conversations that were self-posted on a Korean Reddit-like community.<n>Our dataset will be made publicly available via GitHub.
arXiv Detail & Related papers (2025-01-29T15:32:27Z) - Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards [93.16294577018482]
Arena, the most popular benchmark of this type, ranks models by asking users to select the better response between two randomly selected models.
We show that an attacker can alter the leaderboard (to promote their favorite model or demote competitors) at the cost of roughly a thousand votes.
Our attack consists of two steps: first, we show how an attacker can determine which model was used to generate a given reply with more than $95%$ accuracy; and then, the attacker can use this information to consistently vote against a target model.
arXiv Detail & Related papers (2025-01-13T17:12:38Z) - Seq2Seq Model-Based Chatbot with LSTM and Attention Mechanism for Enhanced User Interaction [1.937324318931008]
This work proposes a Sequence-to-Sequence (Seq2Seq) model with an encoder-decoder architecture that incorporates attention mechanisms and Long Short-Term Memory (LSTM) cells.<n>The proposed Seq2Seq model-based robot is trained, validated, and tested on a dataset specifically for the tourism sector in Draa-Tafilalet, Morocco.
arXiv Detail & Related papers (2024-12-27T23:50:54Z) - WildChat: 1M ChatGPT Interaction Logs in the Wild [88.05964311416717]
WildChat is a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns.
In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses.
arXiv Detail & Related papers (2024-05-02T17:00:02Z) - MutaBot: A Mutation Testing Approach for Chatbots [3.811067614153878]
MutaBot addresses mutations at multiple levels, including conversational flows, intents, and contexts.
We assess the tool with three Dialogflow chatbots and test cases generated with Botium, revealing weaknesses in the test suites.
arXiv Detail & Related papers (2024-01-18T20:38:27Z) - Evaluating Chatbots to Promote Users' Trust -- Practices and Open
Problems [11.427175278545517]
This paper reviews current practices for testing chatbots.
It identifies gaps as open problems in pursuit of user trust.
It outlines a path forward to mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society.
arXiv Detail & Related papers (2023-09-09T22:40:30Z) - InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
Beyond Language [82.92236977726655]
InternGPT stands for textbfinteraction, textbfnonverbal, and textbfchatbots.
We present an interactive visual framework named InternGPT, or iGPT for short.
arXiv Detail & Related papers (2023-05-09T17:58:34Z) - A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation.
It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries.
However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - CASS: Towards Building a Social-Support Chatbot for Online Health
Community [67.45813419121603]
The CASS architecture is based on advanced neural network algorithms.
It can handle new inputs from users and generate a variety of responses to them.
With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support.
arXiv Detail & Related papers (2021-01-04T05:52:03Z) - Personalized Chatbot Trustworthiness Ratings [19.537492400265577]
We envision a personalized rating methodology for chatbots that relies on separate rating modules for each issue.
The method is independent of the specific trust issues and is parametric to the aggregation procedure.
arXiv Detail & Related papers (2020-05-13T22:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.