SATORI: Static Test Oracle Generation for REST APIs
- URL: http://arxiv.org/abs/2508.16318v2
- Date: Mon, 01 Sep 2025 08:35:27 GMT
- Title: SATORI: Static Test Oracle Generation for REST APIs
- Authors: Juan C. Alonso, Alberto Martin-Lopez, Sergio Segura, Gabriele Bavota, Antonio Ruiz-Cortés,
- Abstract summary: This paper introduces SATORI (Static API Test ORacle Inference), a black-box approach for generating test oracles for REST APIs.<n>SATORI uses large language models to infer the expected behavior of an API by analyzing their OpenAPI Specification.<n>We show that SATORI can automatically generate up to hundreds of valid test oracles per operation.
- Score: 9.848517409976965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: REST API test case generation tools are evolving rapidly, with growing capabilities for the automated generation of complex tests. However, despite their strengths in test data generation, these tools are constrained by the types of test oracles they support, often limited to crashes, regressions, and noncompliance with API specifications or design standards. This paper introduces SATORI (Static API Test ORacle Inference), a black-box approach for generating test oracles for REST APIs by analyzing their OpenAPI Specification. SATORI uses large language models to infer the expected behavior of an API by analyzing the properties of the response fields of its operations, such as their name and descriptions. To foster its adoption, we extended the PostmanAssertify tool to automatically convert the test oracles reported by SATORI into executable assertions. Evaluation results on 17 operations from 12 industrial APIs show that SATORI can automatically generate up to hundreds of valid test oracles per operation. SATORI achieved an F1-score of 74.3%, outperforming the state-of-the-art dynamic approach AGORA+ (69.3%)-which requires executing the API-when generating comparable oracle types. Moreover, our findings show that static and dynamic oracle inference methods are complementary: together, SATORI and AGORA+ found 90% of the oracles in our annotated ground-truth dataset. Notably, SATORI uncovered 18 bugs in popular APIs (Amadeus Hotel, Deutschebahn, FDIC, GitLab, Marvel, OMDb and Vimeo) leading to documentation updates by the API maintainers.
Related papers
- Combining Static and Dynamic Approaches for Mining and Testing Constraints for RESTful API Testing [8.972346309150199]
We propose to combine a novel static analysis approach (in which the constraints for API response bodies are mined from API specifications) with the dynamic approach.<n>We leverage large language models (LLMs) to comprehend the API specifications, mine constraints for response bodies, and generate test cases.<n>We also use its generated test cases to detect 21 mismatches between the API specification and actual response data for 8 real-world APIs.
arXiv Detail & Related papers (2025-04-24T06:28:18Z) - Test Amplification for REST APIs via Single and Multi-Agent LLM Systems [1.6499388997661122]
We investigate the use of large language model (LLM) systems, both single-agent and multi-agent setups, for amplifying existing REST API test suites.<n>We present a comparative evaluation of the two approaches across several dimensions, including test coverage, bug detection effectiveness, and practical considerations such as computational cost and energy usage.
arXiv Detail & Related papers (2025-04-10T20:19:50Z) - Utilizing API Response for Test Refinement [2.8002188463519944]
This paper proposes a dynamic test refinement approach that leverages the response message.<n>Using an intelligent agent, the approach adds constraints to the API specification that are further used to generate a test scenario.<n>The proposed approach led to a decrease in the number of 4xx responses, taking a step closer to generating more realistic test cases.
arXiv Detail & Related papers (2025-01-30T05:26:32Z) - LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom Large Language Models (LLMs) to generate realistic test inputs.<n>We evaluate it against several state-of-the-art REST API testing tools, including RESTGPT, a GPT-powered specification-enhancement tool.<n>Our study shows that small language models can perform as well as, or better than, large language models in REST API testing.
arXiv Detail & Related papers (2025-01-15T05:51:20Z) - A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs [46.65963514391019]
We present AutoRestTest, the first black-box tool to adopt a dependency-embedded multi-agent approach for REST API testing.<n>Our approach treats REST API testing as a separable problem, where four agents collaborate to optimize API exploration.<n>Our evaluation of AutoRestTest on 12 real-world REST services shows that it outperforms the four leading black-box REST API testing tools.
arXiv Detail & Related papers (2024-11-11T16:20:27Z) - Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
API providers may quantize, watermark, or finetune the underlying model, changing the output distribution.<n>We formalize detecting such distortions by Model Equality Testing, a two-sample testing problem.<n>A test built on a simple string kernel achieves a median of 77.4% power against a range of distortions.
arXiv Detail & Related papers (2024-10-26T18:34:53Z) - DeepREST: Automated Test Case Generation for REST APIs Exploiting Deep Reinforcement Learning [5.756036843502232]
This paper introduces DeepREST, a novel black-box approach for automatically testing REST APIs.
It leverages deep reinforcement learning to uncover implicit API constraints, that is, constraints hidden from API documentation.
Our empirical validation suggests that the proposed approach is very effective in achieving high test coverage and fault detection.
arXiv Detail & Related papers (2024-08-16T08:03:55Z) - FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability.
Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request.
We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z) - KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs.
Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - Exploring Behaviours of RESTful APIs in an Industrial Setting [0.43012765978447565]
We propose a set of behavioural properties, common to REST APIs, which are used to generate examples of behaviours that these APIs exhibit.
These examples can be used both (i) to further the understanding of the API and (ii) as a source of automatic test cases.
Our approach can generate examples deemed relevant for understanding the system and for a source of test generation by practitioners.
arXiv Detail & Related papers (2023-10-26T11:33:11Z) - Adaptive REST API Testing with Reinforcement Learning [54.68542517176757]
Current testing tools lack efficient exploration mechanisms, treating all operations and parameters equally.
Current tools struggle when response schemas are absent in the specification or exhibit variants.
We present an adaptive REST API testing technique incorporates reinforcement learning to prioritize operations during exploration.
arXiv Detail & Related papers (2023-09-08T20:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.