Automated Testing of Task-based Chatbots: How Far Are We?
- URL: http://arxiv.org/abs/2602.13072v1
- Date: Fri, 13 Feb 2026 16:32:50 GMT
- Title: Automated Testing of Task-based Chatbots: How Far Are We?
- Authors: Diego Clerissi, Elena Masserini, Daniela Micucci, Leonardo Mariani,
- Abstract summary: Task-based chatbots are software, typically embedded in real-world applications, that assist users in completing tasks through a conversational interface.<n>In this paper, we evaluate the effectiveness of state-of-the-art testing techniques on a curated selection of task-based chatbots from GitHub.
- Score: 5.64612424709862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Task-based chatbots are software, typically embedded in real-world applications, that assist users in completing tasks through a conversational interface. As chatbots are gaining popularity, effectively assessing their quality has become crucial. Whereas traditional testing techniques fail to systematically exercise the conversational space of chatbots, several approaches specifically targeting chatbots have emerged from both industry and research. Although these techniques have shown advancements over the years, they still exhibit limitations, such as simplicity of the generated test scenarios and weakness in implemented oracles. In this paper, we conduct a confirmatory study to investigate such limitations by evaluating the effectiveness of state-of-the-art chatbot testing techniques on a curated selection of task-based chatbots from GitHub, developed using the most popular commercial and open-source platforms.
Related papers
- Towards Multi-Platform Mutation Testing of Task-based Chatbots [5.64612424709862]
We present our extension of MUTABOT to multiple platforms (Dialogflow and Rasa)<n>MUTABOT is a mutation testing approach for injecting faults in conversations.<n>We show how mutation testing can be used to reveal weaknesses in test suites generated by the Botium state-of-the-art test generator.
arXiv Detail & Related papers (2025-09-01T11:36:06Z) - Chatbot Deployment Considerations for Application-Agnostic Human-Machine Dialogues [0.0]
This paper aims to shed light on basic, elemental, considerations that technologists must consider.<n>By looking at this case-study, we aim to call for consideration of societal values as a paramount factor.
arXiv Detail & Related papers (2025-08-30T22:46:09Z) - Test Case Generation for Dialogflow Task-Based Chatbots [3.488620810035772]
Test Generator (CTG) is an automated testing technique designed for task-based chatbots.<n>We conducted an experiment comparing CTG with state-of-the-art BOTIUM and CHARM tools.<n>CTG outperformed the competitors in terms of robustness and effectiveness.
arXiv Detail & Related papers (2025-03-07T16:39:27Z) - Measuring and Controlling Instruction (In)Stability in Language Model Dialogs [72.38330196290119]
System-prompting is a tool for customizing language-model chatbots, enabling them to follow a specific instruction.
We propose a benchmark to test the assumption, evaluating instruction stability via self-chats.
We reveal a significant instruction drift within eight rounds of conversations.
We propose a lightweight method called split-softmax, which compares favorably against two strong baselines.
arXiv Detail & Related papers (2024-02-13T20:10:29Z) - Evaluating Chatbots to Promote Users' Trust -- Practices and Open
Problems [11.427175278545517]
This paper reviews current practices for testing chatbots.
It identifies gaps as open problems in pursuit of user trust.
It outlines a path forward to mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society.
arXiv Detail & Related papers (2023-09-09T22:40:30Z) - InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
Beyond Language [82.92236977726655]
InternGPT stands for textbfinteraction, textbfnonverbal, and textbfchatbots.
We present an interactive visual framework named InternGPT, or iGPT for short.
arXiv Detail & Related papers (2023-05-09T17:58:34Z) - FaceChat: An Emotion-Aware Face-to-face Dialogue Framework [58.67608580694849]
FaceChat is a web-based dialogue framework that enables emotionally-sensitive and face-to-face conversations.
System has a wide range of potential applications, including counseling, emotional support, and personalized customer service.
arXiv Detail & Related papers (2023-03-08T20:45:37Z) - A Literature Survey of Recent Advances in Chatbots [0.0]
We review recent advances on chatbots, where Artificial Intelligence and Natural Language processing are used.
We highlight the main challenges and limitations of current work and make recommendations for future research investigation.
arXiv Detail & Related papers (2022-01-17T23:08:58Z) - Training Conversational Agents with Generative Conversational Networks [74.9941330874663]
We use Generative Conversational Networks to automatically generate data and train social conversational agents.
We evaluate our approach on TopicalChat with automatic metrics and human evaluators, showing that with 10% of seed data it performs close to the baseline that uses 100% of the data.
arXiv Detail & Related papers (2021-10-15T21:46:39Z) - CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement
Learning [60.348822346249854]
This study presents a framework whereby several empathetic chatbots are based on understanding users' implied feelings and replying empathetically for multiple dialogue turns.
We call these chatbots CheerBots. CheerBots can be retrieval-based or generative-based and were finetuned by deep reinforcement learning.
To respond in an empathetic way, we develop a simulating agent, a Conceptual Human Model, as aids for CheerBots in training with considerations on changes in user's emotional states in the future to arouse sympathy.
arXiv Detail & Related papers (2021-10-08T07:44:47Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - CASS: Towards Building a Social-Support Chatbot for Online Health
Community [67.45813419121603]
The CASS architecture is based on advanced neural network algorithms.
It can handle new inputs from users and generate a variety of responses to them.
With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support.
arXiv Detail & Related papers (2021-01-04T05:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.