TaskAudit: Detecting Functiona11ity Errors in Mobile Apps via Agentic Task Execution
- URL: http://arxiv.org/abs/2510.12972v1
- Date: Tue, 14 Oct 2025 20:28:49 GMT
- Title: TaskAudit: Detecting Functiona11ity Errors in Mobile Apps via Agentic Task Execution
- Authors: Mingyuan Zhong, Xia Chen, Davin Win Kyi, Chen Li, James Fogarty, Jacob O. Wobbrock,
- Abstract summary: TaskAudit is an accessibility evaluation system that focuses on detecting functiona11ity errors through simulated interactions.<n> Evaluation on real-world apps shows that our strategy detects 48 functiona11ity errors from 54 app screens, compared to between 4 and 20 with existing checkers.
- Score: 17.208420259998178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accessibility checkers are tools in support of accessible app development and their use is encouraged by accessibility best practices. However, most current checkers evaluate static or mechanically-generated contexts, failing to capture common accessibility errors impacting mobile app functionality. We present TaskAudit, an accessibility evaluation system that focuses on detecting functiona11ity errors through simulated interactions. TaskAudit comprises three components: a Task Generator that constructs interactive tasks from app screens, a Task Executor that uses agents with a screen reader proxy to perform these tasks, and an Accessibility Analyzer that detects and reports accessibility errors by examining interaction traces. Evaluation on real-world apps shows that our strategy detects 48 functiona11ity errors from 54 app screens, compared to between 4 and 20 with existing checkers. Our analysis demonstrates common error patterns that TaskAudit can detect in addition to prior work, including label-functionality mismatch, cluttered navigation, and inappropriate feedback.
Related papers
- FuncDroid: Towards Inter-Functional Flows for Comprehensive Mobile App GUI Testing [6.346121677855558]
We introduce an inter-functional-flow-oriented GUI testing approach with the dual goals of precise model construction and deep bug detection.<n>By combining two complementary test-generation views, it can adaptively refine functional boundaries and systematically explore inter-functional flows.<n>We implement our approach in a tool called FuncDroid, and evaluate it on two benchmarks: (1) a widely-used open-source benchmark with 50 reproducible crash bugs and (2) a diverse set of 52 popular commercial apps.
arXiv Detail & Related papers (2026-02-13T11:40:02Z) - Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing [15.383375235673954]
We propose MAdroid, a novel multi-agent approach powered by the Large Language Models (LLMs) to automate the multi-user interactive task for app feature testing.<n>Specifically, MAdroid employs two functional types of multi-agents: user agents (Operator) and supervisor agents (Coordinator and Observer)<n>Our evaluation, which included 41 multi-user interactive tasks, demonstrates the effectiveness of our approach, achieving 82.9% of the tasks with 96.8% action similarity.
arXiv Detail & Related papers (2025-06-21T01:38:53Z) - Advancing Mobile UI Testing by Learning Screen Usage Semantics [0.42303492200814446]
This research seeks to enhance automated UI testing techniques by learning the screen usage semantics of mobile apps.<n>It also improves the usability of a mobile app's interface by identifying and mitigating UI design issues.
arXiv Detail & Related papers (2025-05-15T01:40:43Z) - Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.<n>Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z) - AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z) - Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps [26.96558418166514]
This paper proposes a novel vision-driven, multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugs.<n>We evaluate Trident on 590 non-crash bugs and compare it with 12 baselines, it can achieve more than 14%-112% and 108%-147% boost in average recall and precision.
arXiv Detail & Related papers (2024-07-03T11:58:09Z) - GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices [47.98821056800437]
We present GUIOdyssey, a dataset for cross-app mobile GUI navigation.<n>GuiOdyssey comprises 8,334 episodes with an average of 15.3 steps per episode, covering 6 mobile devices, 212 distinct apps, and 1,357 app combinations.<n>We develop OdysseyAgent, an exploratory multimodal agent for long-step cross-app navigation equipped with a history resampler module.
arXiv Detail & Related papers (2024-06-12T17:44:26Z) - I2EDL: Interactive Instruction Error Detection and Localization [65.25839671641218]
We propose a novel task of Interactive VLN in Continuous Environments (IVLN-CE)
It allows the agent to interact with the user during the VLN-CE navigation to verify any doubts regarding the instruction errors.
We leverage a pre-trained module to detect instruction errors and pinpoint them in the instruction by cross-referencing the textual input and past observations.
arXiv Detail & Related papers (2024-06-07T16:52:57Z) - Task-Agnostic Detector for Insertion-Based Backdoor Attacks [53.77294614671166]
We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection.
TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks.
TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.
arXiv Detail & Related papers (2024-03-25T20:12:02Z) - Towards Automated Accessibility Report Generation for Mobile Apps [14.908672785900832]
We propose a system to generate whole app accessibility reports.
It combines varied data collection methods (e.g., app crawling, manual recording) with an existing accessibility scanner.
arXiv Detail & Related papers (2023-09-29T19:05:11Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.