Related papers: SAINT: Service-level Integration Test Generation with Program Analysis and LLM-based Agents

SAINT: Service-level Integration Test Generation with Program Analysis and LLM-based Agents

URL: http://arxiv.org/abs/2511.13305v1
Date: Mon, 17 Nov 2025 12:29:42 GMT
Title: SAINT: Service-level Integration Test Generation with Program Analysis and LLM-based Agents
Authors: Rangeet Pan, Raju Pavuluri, Ruikai Huang, Rahul Krishna, Tyler Stennett, Alessandro Orso, Saurabh SInha,
Abstract summary: SAINT is a novel white-box testing approach for service-level testing of enterprise Java applications.<n> SAINT combines static analysis, large language models (LLMs), and LLM-based agents to automatically generate endpoint and scenario-based tests.
Score: 43.3273990835497
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Enterprise applications are typically tested at multiple levels, with service-level testing playing an important role in validating application functionality. Existing service-level testing tools, especially for RESTful APIs, often employ fuzzing and/or depend on OpenAPI specifications which are not readily available in real-world enterprise codebases. Moreover, these tools are limited in their ability to generate functional tests that effectively exercise meaningful scenarios. In this work, we present SAINT, a novel white-box testing approach for service-level testing of enterprise Java applications. SAINT combines static analysis, large language models (LLMs), and LLM-based agents to automatically generate endpoint and scenario-based tests. The approach builds two key models: an endpoint model, capturing syntactic and semantic information about service endpoints, and an operation dependency graph, capturing inter-endpoint ordering constraints. SAINT then employs LLM-based agents to generate tests. Endpoint-focused tests aim to maximize code and database interaction coverage. Scenario-based tests are synthesized by extracting application use cases from code and refining them into executable tests via planning, action, and reflection phases of the agentic loop. We evaluated SAINT on eight Java applications, including a proprietary enterprise application. Our results illustrate the effectiveness of SAINT in coverage, fault detection, and scenario generation. Moreover, a developer survey provides strong endorsement of the scenario-based tests generated by SAINT. Overall, our work shows that combining static analysis with agentic LLM workflows enables more effective, functional, and developer-aligned service-level test generation.

Related papers

Enhancing LLM-Based Test Generation by Eliminating Covered Code [2.2566909388480743]
Large Language Models (LLMs) have shown promise in improving test generation.<n>We propose a scalable LLM-based unit test generation method.<n>Our approach outperforms state-of-the-art LLM-based and search-based methods.
arXiv Detail & Related papers (2026-02-25T15:16:43Z)
LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework [2.501198441875755]
AgoneTest is an evaluation framework for Large Language Model-generated unit tests in Java.<n>For the subset of tests that compile, LLM-generated tests can match or exceed human-written tests in terms of coverage and defect detection.
arXiv Detail & Related papers (2025-11-25T15:33:00Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
Evaluating LLMs on Sequential API Call Through Automated Test Generation [10.621357661774244]
StateGen is an automated framework designed to generate diverse coding tasks involving sequential API interactions.<n>We construct StateEval, a benchmark encompassing 120 verified test cases spanning across three representative scenarios.<n> Experimental results confirm that StateGen can effectively generate challenging and realistic API-oriented tasks.
arXiv Detail & Related papers (2025-07-13T03:52:51Z)
AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL [46.65963514391019]
AutoRestTest is a novel tool that integrates the Semantic Property Dependency Graph (SPDG) with Multi-Agent Reinforcement Learning (MARL) and large language models (LLMs) for effective REST API testing.
arXiv Detail & Related papers (2025-01-15T05:54:33Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.<n>Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.<n> Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs [46.65963514391019]
We present AutoRestTest, the first black-box tool to adopt a dependency-embedded multi-agent approach for REST API testing.<n>Our approach treats REST API testing as a separable problem, where four agents collaborate to optimize API exploration.<n>Our evaluation of AutoRestTest on 12 real-world REST services shows that it outperforms the four leading black-box REST API testing tools.
arXiv Detail & Related papers (2024-11-11T16:20:27Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
LLM-based Unit Test Generation via Property Retrieval [26.906316611858518]
Property-Based Retrieval Augmentation extends LLM-based Retrieval-Augmented Generation beyond basic vector, text similarity, and graph-based methods. Our approach considers task-specific context and introduces a tailored property retrieval mechanism. We implement this approach in a tool called APT, which sequentially performs preprocessing, property retrieval, and unit test generation.
arXiv Detail & Related papers (2024-10-17T13:33:12Z)
ASTER: Natural and Multi-language Unit Test Generation with LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases.<n>We conduct an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness.
arXiv Detail & Related papers (2024-09-04T21:46:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.