Related papers: BOSQTGEN: Breaking the Sound Barrier in Test Generation

BOSQTGEN: Breaking the Sound Barrier in Test Generation

URL: http://arxiv.org/abs/2510.19777v1
Date: Wed, 22 Oct 2025 17:11:30 GMT
Title: BOSQTGEN: Breaking the Sound Barrier in Test Generation
Authors: S M Sadrul Islam Asif, James Chen, Earl T. Barr, Mark Marron,
Abstract summary: We introduce BOSQTGEN, a novel black-box and tool for API test generation.<n> BOSQTGEN utilizes a novel approach for decomposing API specifications into primitives, using LLMs to suggest coherent interactions for them, and employing testing to efficiently sample over these values.<n>The resulting BOSQTGEN system achieves an average of 82% of critical code coverage on benchmarks, often a 20% or more increase over prior state-of-the-art systems.
Score: 3.052470294814771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern software is increasingly built by composing APIs, elevating the API contract to a critical role. Inadequate contracts, however, lead to mismatched expectations and failures, creating a pressing need for robust conformance testing. Current test generation techniques are hindered by key challenges: polyglot systems, source code inaccessibility, a cost-reliability trade-off, and, most critically, the difficulty of generating structured inputs. We introduce BOSQTGEN, a novel black-box methodology and tool for API test generation. BOSQTGEN utilizes a novel approach for decomposing API specifications into primitives, using LLMs to suggest coherent strata for them, and employing combinatorial testing to efficiently sample over these values. This approach ensures coverage of critical interactions while avoiding the redundancy of random sampling. The resulting BOSQTGEN system achieves an average of 82% code coverage on RESTful benchmarks, often a 20% or more increase over prior state-of-the-art systems and nearing parity with hand-written test suites. Providing a fully API-driven approach to test generation, enables developers to automatically create high-quality test cases for validation or test-driven development.

Related papers

Enhancing LLM-Based Test Generation by Eliminating Covered Code [2.2566909388480743]
Large Language Models (LLMs) have shown promise in improving test generation.<n>We propose a scalable LLM-based unit test generation method.<n>Our approach outperforms state-of-the-art LLM-based and search-based methods.
arXiv Detail & Related papers (2026-02-25T15:16:43Z)
Scaling Agentic Verifier for Competitive Coding [66.11758166379092]
Large language models (LLMs) have demonstrated strong coding capabilities but still struggle to solve competitive programming problems correctly in a single attempt.<n>Execution-based re-ranking offers a promising test-time scaling strategy, yet existing methods are constrained by either difficult test case generation or inefficient random input sampling.<n>We propose Agentic Verifier, an execution-based agent that actively reasons about program behaviors and searches for highly discriminative test inputs.
arXiv Detail & Related papers (2026-02-04T06:30:40Z)
ATGen: Adversarial Reinforcement Learning for Test Case Generation [78.48498301767079]
Large Language Models (LLMs) excel at code generation, yet their outputs often contain subtle bugs.<n>Existing test generation methods rely on static datasets.<n>We introduce ATGen, a framework that trains a test case generator via adversarial reinforcement learning.
arXiv Detail & Related papers (2025-10-16T12:49:25Z)
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking [54.43083499412643]
Test-time algorithms that combine the generative power of language models with process verifiers offer a promising lever for eliciting new reasoning capabilities.<n>We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors.
arXiv Detail & Related papers (2025-10-03T16:21:14Z)
Combining TSL and LLM to Automate REST API Testing: A Comparative Study [3.8615905456206256]
RestTSLLM is an approach that uses Test Specification Language (TSL) in conjunction with Large Language Models (LLMs) to automate the generation of test cases for REST APIs.<n>The proposed solution integrates prompt engineering techniques with an automated pipeline to evaluate various LLMs on their ability to generate tests from OpenAPI specifications.<n>The results indicate that the best-performing LLMs consistently produced robust and contextually coherent REST API tests.
arXiv Detail & Related papers (2025-09-05T23:32:35Z)
Alignment with Fill-In-the-Middle for Enhancing Code Generation [56.791415642365415]
We propose a novel approach that splits code snippets into smaller, granular blocks, creating more diverse DPO pairs from the same test cases.<n>Our approach demonstrates significant improvements in code generation tasks, as validated by experiments on benchmark datasets such as HumanEval (+), MBPP (+), APPS, LiveCodeBench, and BigCodeBench.
arXiv Detail & Related papers (2025-08-27T03:15:53Z)
PALM: Synergizing Program Analysis and LLMs to Enhance Rust Unit Test Coverage [14.702182387149547]
This paper presents PALM, an approach that leverages large language models (LLMs) to enhance the generation of high-coverage unit tests.<n> PALM performs program analysis to identify branching conditions within functions, which are then combined into path constraints.<n>We implement the approach and evaluate it in 15 open-source Rust crates.
arXiv Detail & Related papers (2025-06-10T17:21:21Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
TestForge: Feedback-Driven, Agentic Test Suite Generation [7.288137795439405]
TestForge is an agentic unit testing framework designed to cost-effectively generate high-quality test suites for real-world code.<n>TestForge produces more natural and understandable tests compared to state-of-the-art search-based techniques.
arXiv Detail & Related papers (2025-03-18T20:21:44Z)
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data.<n>It comprises question-solution-test triplets that are systematically validated via a self-verification procedure.<n>This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.