Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation
- URL: http://arxiv.org/abs/2602.24108v1
- Date: Fri, 27 Feb 2026 15:47:37 GMT
- Title: Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation
- Authors: Yakun Zhang, Zihan Wang, Xinzhi Peng, Zihao Xie, Xiaodong Wang, Xutao Li, Dan Hao, Lu Zhang, Yunming Ye,
- Abstract summary: We propose LogiDroid, a two-stage approach that generates functional test cases by extracting business logic and adapting it to target applications.<n>We assess the effectiveness of LogiDroid using two widely-used datasets that cover 28 real-world applications and 190 functional requirements.
- Score: 32.20036552577251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Functional testing is essential for verifying that the business logic of mobile applications aligns with user requirements, serving as the primary methodology for quality assurance in software development. Despite its importance, functional testing remains heavily dependent on manual effort due to two core challenges. First, acquiring and reusing complex business logic from unstructured requirements remains difficult, which hinders the understanding of specific functionalities. Second, a significant semantic gap exists when adapting business logic to the diverse GUI environments, which hinders the generation of test cases for specific mobile applications. To address the preceding challenges, we propose LogiDroid, a two-stage approach that generates individual functional test cases by extracting business logic and adapting it to target applications. First, in the Knowledge Retrieval and Fusion stage, we construct a dataset to retrieve relevant cases and extract business logic for the target functionality. Second, in the Context-Aware Test Generation stage, LogiDroid jointly analyzes the extracted business logic and the real-time GUI environment to generate functional test cases. This design allows LogiDroid to accurately understand application semantics and use domain expertise to generate complete test cases with verification assertions. We assess the effectiveness of LogiDroid using two widely-used datasets that cover 28 real-world applications and 190 functional requirements. Experimental results show that LogiDroid successfully tested 40% of functional requirements on the FrUITeR dataset (an improvement of over 48% compared to the state-of-the-art approaches) and 65% on the Lin dataset (an improvement of over 55% compared to the state-of-the-art approaches). These results demonstrate the significant effectiveness of LogiDroid in functional test generation.
Related papers
- BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? [61.247730037229815]
We introduce BeyondSWE, a comprehensive benchmark that broadens existing evaluations along two axes - resolution scope and knowledge scope.<n>To investigate the role of external knowledge, we develop SearchSWE, a framework that integrates deep search with coding abilities.<n>This work offers both a realistic, challenging evaluation benchmark and a flexible framework to advance research toward more capable code agents.
arXiv Detail & Related papers (2026-03-03T17:52:01Z) - FuncDroid: Towards Inter-Functional Flows for Comprehensive Mobile App GUI Testing [6.346121677855558]
We introduce an inter-functional-flow-oriented GUI testing approach with the dual goals of precise model construction and deep bug detection.<n>By combining two complementary test-generation views, it can adaptively refine functional boundaries and systematically explore inter-functional flows.<n>We implement our approach in a tool called FuncDroid, and evaluate it on two benchmarks: (1) a widely-used open-source benchmark with 50 reproducible crash bugs and (2) a diverse set of 52 popular commercial apps.
arXiv Detail & Related papers (2026-02-13T11:40:02Z) - SAINT: Service-level Integration Test Generation with Program Analysis and LLM-based Agents [43.3273990835497]
SAINT is a novel white-box testing approach for service-level testing of enterprise Java applications.<n> SAINT combines static analysis, large language models (LLMs), and LLM-based agents to automatically generate endpoint and scenario-based tests.
arXiv Detail & Related papers (2025-11-17T12:29:42Z) - Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection [108.5042835056188]
This work introduces Agent4FaceForgery to address two fundamental problems.<n>How to capture the diverse intents and iterative processes of human forgery creation.<n>How to model the complex, often adversarial, text-image interactions that accompany forgeries in social media.
arXiv Detail & Related papers (2025-09-16T01:05:01Z) - Automatic High-Level Test Case Generation using Large Language Models [1.8136446064778242]
Primary challenge is not writing test scripts but aligning testing efforts with business requirements.<n>We constructed a use-case dataset to train/fine-tune models for generating high-level test cases.<n>Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation.
arXiv Detail & Related papers (2025-03-23T09:14:41Z) - Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance.<n>We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - GUI Test Migration via Abstraction and Concretization [26.503512328876198]
GUI test migration aims to produce test cases with events and assertions to test specific functionalities of a target app.<n>We propose a new migration paradigm (i.e., the abstraction-concretization paradigm) that first abstracts the test logic for the target functionality.<n>We introduce MACdroid, the first approach that migrates GUI test cases based on this paradigm.
arXiv Detail & Related papers (2024-09-08T08:46:05Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Generating Test Scenarios from NL Requirements using Retrieval-Augmented LLMs: An Industrial Study [5.179738379203527]
This paper presents an automated approach (RAGTAG) for test scenario generation using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs)
We evaluate RAGTAG on two industrial projects from Austrian Post with bilingual requirements in German and English.
arXiv Detail & Related papers (2024-04-19T10:27:40Z) - AR-LSAT: Investigating Analytical Reasoning of Text [57.1542673852013]
We study the challenge of analytical reasoning of text and introduce a new dataset consisting of questions from the Law School Admission Test from 1991 to 2016.
We analyze what knowledge understanding and reasoning abilities are required to do well on this task.
arXiv Detail & Related papers (2021-04-14T02:53:32Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.