Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead
- URL: http://arxiv.org/abs/2412.08581v1
- Date: Wed, 11 Dec 2024 17:57:23 GMT
- Title: Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead
- Authors: Yanqi Su, Zhenchang Xing, Chong Wang, Chunyang Chen, Xiwei Xu, Qinghua Lu, Liming Zhu,
- Abstract summary: Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective.
We explore the feasibility, challenges and road ahead of automated scenario-based ET (a.k.a soap opera testing)
- Score: 43.15092098658384
- License:
- Abstract: Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective. Although ET has proven effective in system-level testing of interactive systems, the need for manual execution has hindered large-scale adoption. In this work, we explore the feasibility, challenges and road ahead of automated scenario-based ET (a.k.a soap opera testing). We conduct a formative study, identifying key insights for effective manual soap opera testing and challenges in automating the process. We then develop a multi-agent system leveraging LLMs and a Scenario Knowledge Graph (SKG) to automate soap opera testing. The system consists of three multi-modal agents, Planner, Player, and Detector that collaborate to execute tests and identify potential bugs. Experimental results demonstrate the potential of automated soap opera testing, but there remains a significant gap compared to manual execution, especially under-explored scenario boundaries and incorrectly identified bugs. Based on the observation, we envision road ahead for the future of automated soap opera testing, focusing on three key aspects: the synergy of neural and symbolic approaches, human-AI co-learning, and the integration of soap opera testing with broader software engineering practices. These insights aim to guide and inspire the future research.
Related papers
- VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework [4.802551205178858]
Existing large language model (LLM)-assisted or automated penetration testing approaches often suffer from inefficiencies.
VulnBot decomposes complex tasks into three specialized phases: reconnaissance, scanning, and exploitation.
Key design features include role specialization, penetration path planning, inter-agent communication, and generative penetration behavior.
arXiv Detail & Related papers (2025-01-23T06:33:05Z) - AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems [26.605694684145313]
In this study, we design and implement a testing tool, tool, to comprehensively and effectively evaluate AI systems.
The tool extensively assesses adversarial robustness, model interpretability, and performs neuron analysis.
Our research sheds light on a general solution for AI systems testing landscape.
arXiv Detail & Related papers (2024-11-09T11:15:17Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Test Oracle Automation in the era of LLMs [52.69509240442899]
Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks.
This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
arXiv Detail & Related papers (2024-05-21T13:19:10Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - A Preliminary Study on Using Large Language Models in Software
Pentesting [2.0551676463612636]
Large language models (LLM) are perceived to offer promising potentials for automating security tasks.
We investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code.
arXiv Detail & Related papers (2024-01-30T21:42:59Z) - Can a Chatbot Support Exploratory Software Testing? Preliminary Results [0.9249657468385781]
exploratory testing is the de facto approach in agile teams.
This paper presents BotExpTest, designed to support testers while performing exploratory tests of software applications.
We implemented BotExpTest on top of the instant messaging social platform Discord.
Preliminary analyses indicate that BotExpTest may be as effective as similar approaches and help testers to uncover different bugs.
arXiv Detail & Related papers (2023-07-11T21:11:21Z) - Artificial Intelligence in Software Testing : Impact, Problems,
Challenges and Prospect [0.0]
The study aims to recognize and explain some of the biggest challenges software testers face while applying AI to testing.
The paper also proposes some key contributions of AI in the future to the domain of software testing.
arXiv Detail & Related papers (2022-01-14T10:21:51Z) - The MineRL BASALT Competition on Learning from Human Feedback [58.17897225617566]
The MineRL BASALT competition aims to spur forward research on this important class of techniques.
We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions.
We provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline.
arXiv Detail & Related papers (2021-07-05T12:18:17Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.