FuncDroid: Towards Inter-Functional Flows for Comprehensive Mobile App GUI Testing
- URL: http://arxiv.org/abs/2602.12834v1
- Date: Fri, 13 Feb 2026 11:40:02 GMT
- Title: FuncDroid: Towards Inter-Functional Flows for Comprehensive Mobile App GUI Testing
- Authors: Jinlong He, Changwei Xia, Binru Huang, Jiwei Yan, Jun Yan, Jian Zhang,
- Abstract summary: We introduce an inter-functional-flow-oriented GUI testing approach with the dual goals of precise model construction and deep bug detection.<n>By combining two complementary test-generation views, it can adaptively refine functional boundaries and systematically explore inter-functional flows.<n>We implement our approach in a tool called FuncDroid, and evaluate it on two benchmarks: (1) a widely-used open-source benchmark with 50 reproducible crash bugs and (2) a diverse set of 52 popular commercial apps.
- Score: 6.346121677855558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As mobile application (app) functionalities grow increasingly complex and their iterations accelerate, ensuring high reliability presents significant challenges. While functionality-oriented GUI testing has attracted growing research attention, existing approaches largely overlook interactions across functionalities, making them ineffective at uncovering deep bugs hidden in inter-functional behaviors. To fill this gap, we first design a Functional Flow Graph (FFG), a behavioral model that explicitly captures an app's functional units and their inter-functional interactions. Based on the FFG, we further introduce an inter-functional-flow-oriented GUI testing approach with the dual goals of precise model construction and deep bug detection. This approach is realized through a long-short-term-view-guided testing process. By combining two complementary test-generation views, it can adaptively refine functional boundaries and systematically explore inter-functional flows under diverse triggering conditions. We implement our approach in a tool called FuncDroid, and evaluate it on two benchmarks: (1) a widely-used open-source benchmark with 50 reproducible crash bugs and (2) a diverse set of 52 popular commercial apps. Experimental results demonstrate that FuncDroid significantly outperforms state-of-the-art baselines in both coverage (+28%) and bug detection number (+107%). Moreover, FuncDroid successfully uncovers 18 previously unknown non-crash functional bugs in commercial apps, confirming its practical effectiveness.
Related papers
- Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation [32.20036552577251]
We propose LogiDroid, a two-stage approach that generates functional test cases by extracting business logic and adapting it to target applications.<n>We assess the effectiveness of LogiDroid using two widely-used datasets that cover 28 real-world applications and 190 functional requirements.
arXiv Detail & Related papers (2026-02-27T15:47:37Z) - ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks [37.79008306764891]
Real-world tasks are often complex and allow for multiple valid solutions.<n> offline benchmarks can only validate a single predefined "golden path"<n>Online dynamic testing is constrained by the complexity and non-reproducibility of real devices.<n>This paper introduces a novel graph-structured benchmarking framework.
arXiv Detail & Related papers (2025-10-16T12:30:05Z) - TaskAudit: Detecting Functiona11ity Errors in Mobile Apps via Agentic Task Execution [17.208420259998178]
TaskAudit is an accessibility evaluation system that focuses on detecting functiona11ity errors through simulated interactions.<n> Evaluation on real-world apps shows that our strategy detects 48 functiona11ity errors from 54 app screens, compared to between 4 and 20 with existing checkers.
arXiv Detail & Related papers (2025-10-14T20:28:49Z) - FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression [15.00767095565706]
Functional Attention with a Mixture-of-Experts (FAME) is an end-to-end, fully data-driven framework for function-on-function regression.<n>FAME forms continuous attention by coupling a neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity.<n>Experiments on synthetic and real-world functional-regression benchmarks show that FAME achieves state-of-the-art accuracy, strong robustness to arbitrarily sampled discrete observations.
arXiv Detail & Related papers (2025-10-01T07:53:55Z) - Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing [15.383375235673954]
We propose MAdroid, a novel multi-agent approach powered by the Large Language Models (LLMs) to automate the multi-user interactive task for app feature testing.<n>Specifically, MAdroid employs two functional types of multi-agents: user agents (Operator) and supervisor agents (Coordinator and Observer)<n>Our evaluation, which included 41 multi-user interactive tasks, demonstrates the effectiveness of our approach, achieving 82.9% of the tasks with 96.8% action similarity.
arXiv Detail & Related papers (2025-06-21T01:38:53Z) - Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation [78.78421340836915]
We systematically investigate reference-free hallucination detection in open-domain long-form responses.<n>Our findings reveal that internal states are insufficient for reliably distinguishing between factual and hallucinated content.<n>We introduce a new paradigm, named RATE-FT, that augments fine-tuning with an auxiliary task for the model to jointly learn with the main task of hallucination detection.
arXiv Detail & Related papers (2025-05-18T07:10:03Z) - Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation [85.68881632498909]
We propose a principled framework for synthesizing high-quality training trajectories for large language model agents.<n>The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls.<n> Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery.
arXiv Detail & Related papers (2025-03-10T20:13:07Z) - AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z) - Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps [26.96558418166514]
This paper proposes a novel vision-driven, multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugs.<n>We evaluate Trident on 590 non-crash bugs and compare it with 12 baselines, it can achieve more than 14%-112% and 108%-147% boost in average recall and precision.
arXiv Detail & Related papers (2024-07-03T11:58:09Z) - Glance and Gaze: Inferring Action-aware Points for One-Stage
Human-Object Interaction Detection [81.32280287658486]
We propose a novel one-stage method, namely Glance and Gaze Network (GGNet)
GGNet adaptively models a set of actionaware points (ActPoints) via glance and gaze steps.
We design an actionaware approach that effectively matches each detected interaction with its associated human-object pair.
arXiv Detail & Related papers (2021-04-12T08:01:04Z) - FastIF: Scalable Influence Functions for Efficient Model Interpretation
and Debugging [112.19994766375231]
Influence functions approximate the 'influences' of training data-points for test predictions.
We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time.
Our experiments demonstrate the potential of influence functions in model interpretation and correcting model errors.
arXiv Detail & Related papers (2020-12-31T18:02:34Z) - FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data [106.76845921324704]
We propose a novel method named Feature Interaction Via Edge Search (FIVES)
FIVES formulates the task of interactive feature generation as searching for edges on the defined feature graph.
In this paper, we present our theoretical evidence that motivates us to search for useful interactive features with increasing order.
arXiv Detail & Related papers (2020-07-29T03:33:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.