Related papers: Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing

Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing

URL: http://arxiv.org/abs/2506.17539v2
Date: Tue, 24 Jun 2025 00:54:08 GMT
Title: Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing
Authors: Sidong Feng, Changhao Du, Huaxiao Liu, Qingnan Wang, Zhengwei Lv, Mengfei Wang, Chunyang Chen,
Abstract summary: We propose MAdroid, a novel multi-agent approach powered by the Large Language Models (LLMs) to automate the multi-user interactive task for app feature testing.<n>Specifically, MAdroid employs two functional types of multi-agents: user agents (Operator) and supervisor agents (Coordinator and Observer)<n>Our evaluation, which included 41 multi-user interactive tasks, demonstrates the effectiveness of our approach, achieving 82.9% of the tasks with 96.8% action similarity.
Score: 22.10696272175415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The growing dependence on mobile phones and their apps has made multi-user interactive features, like chat calls, live streaming, and video conferencing, indispensable for bridging the gaps in social connectivity caused by physical and situational barriers. However, automating these interactive features for testing is fraught with challenges, owing to their inherent need for timely, dynamic, and collaborative user interactions, which current automated testing methods inadequately address. Inspired by the concept of agents designed to autonomously and collaboratively tackle problems, we propose MAdroid, a novel multi-agent approach powered by the Large Language Models (LLMs) to automate the multi-user interactive task for app feature testing. Specifically, MAdroid employs two functional types of multi-agents: user agents (Operator) and supervisor agents (Coordinator and Observer). Each agent takes a specific role: the Coordinator directs the interactive task; the Operator mimics user interactions on the device; and the Observer monitors and reviews the task automation process. Our evaluation, which included 41 multi-user interactive tasks, demonstrates the effectiveness of our approach, achieving 82.9% of the tasks with 96.8% action similarity, outperforming the ablation studies and state-of-the-art baselines. Additionally, a preliminary investigation underscores MAdroid's practicality by helping identify 11 multi-user interactive bugs during regression app testing, confirming its potential value in real-world software development contexts.

Related papers

UserBench: An Interactive Gym Environment for User-Centric Agents [110.77212949007958]
Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, but their ability to proactively collaborate with users remains underexplored.<n>We introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions.
arXiv Detail & Related papers (2025-07-29T17:34:12Z)
Agent for User: Testing Multi-User Interactive Features in TikTok [25.10099707365039]
We introduce a novel multi-agent approach, powered by the Large Language Models (LLMs), to automate the testing of multi-user interactive app features.<n>We build a virtual device farm that allocates the necessary number of devices for a given multi-user interactive task.<n>For each device, we deploy an LLM-based agent that simulates a user, thereby mimicking user interactions.
arXiv Detail & Related papers (2025-04-21T22:50:31Z)
YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks [16.443149180969776]
Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks.<n>Such AR capabilities can help AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users.<n>Proactivity of AI Agents on the other hand can help the human user detect and correct any mistakes in agent observed tasks.
arXiv Detail & Related papers (2025-01-16T08:06:02Z)
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.<n>We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)<n>First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.<n>An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z)
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation [89.24729958546168]
Smartphone agents are increasingly important for helping users control devices efficiently.<n>We present SPA-Bench, a comprehensive SmartPhone Agent Benchmark designed to evaluate (M)LLM-based agents.
arXiv Detail & Related papers (2024-10-19T17:28:48Z)
A Survey on Complex Tasks for Goal-Directed Interactive Agents [60.53915548970061]
This survey compiles relevant tasks and environments for evaluating goal-directed interactive agents. An up-to-date compilation of relevant resources can be found on our project website.
arXiv Detail & Related papers (2024-09-27T08:17:53Z)
Benchmarking Mobile Device Control Agents across Diverse Configurations [19.01954948183538]
B-MoCA is a benchmark for evaluating and developing mobile device control agents.<n>We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs.<n>While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to improve effectiveness.
arXiv Detail & Related papers (2024-04-25T14:56:32Z)
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
AutoAgents: A Framework for Automatic Agent Generation [27.74332323317923]
AutoAgents is an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods.
arXiv Detail & Related papers (2023-09-29T14:46:30Z)
MUG: Interactive Multimodal Grounding on User Interfaces [12.035123646959669]
We present MUG, a novel interactive task for multimodal grounding where a user and an agent work collaboratively on an interface screen. Prior works modeled multimodal UI grounding in one round: the user gives a command and the agent responds to the command. MUG allows multiple rounds of interactions such that upon seeing the agent responses, the user can give further commands for the agent to refine or even correct its actions.
arXiv Detail & Related papers (2022-09-29T21:08:18Z)
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving [96.50297622371457]
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to interact with diverse road users in diverse scenarios remains largely unsolved. We develop a dedicated simulation platform called SMARTS that generates diverse and competent driving interactions.
arXiv Detail & Related papers (2020-10-19T18:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.