Intention-Driven Generation of Project-Specific Test Cases
- URL: http://arxiv.org/abs/2507.20619v1
- Date: Mon, 28 Jul 2025 08:35:04 GMT
- Title: Intention-Driven Generation of Project-Specific Test Cases
- Authors: Binhang Qi, Yun Lin, Xinyi Weng, Yuhuan Huang, Chenyan Liu, Hailong Sun, Jin Song Dong,
- Abstract summary: We propose IntentionTest, which generates project-specific tests with validation intention as a structured description.<n>We evaluate IntentionTest against state-of-the-art baselines on 4,146 test cases from 13 open-source projects.
- Score: 14.297390481640068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test cases are valuable assets for maintaining software quality. While numerous automated techniques have been proposed for generating tests (either by maximizing code coverage or by translating focal code into test code), practical tests are seldom driven by coverage alone. In real projects, each test reflects a developer's validation intention for a specific behaviour and embodies rich, project-specific knowledge: which specific APIs to call and what assertions truly matter. Without considering such knowledge, tests can hardly pass code review and be integrated into the software product. In this work, we propose IntentionTest, which generates project-specific tests with validation intention as a structured description. Our design is motivated by two insights: (1) a description of validation intention, compared to coverage and focal code, carries more crucial information about what to test; and (2) practical tests exhibit high code duplication, indicating that domain knowledge is highly reusable for writing new tests. Given a focal code and a description of validation intention (in the form of either an informal comment or a formal test plan), IntentionTest retrieves a referable test in the project to guide test generation. Moreover, IntentionTest reduces the test generation problem into an editing problem on the test code regarding the validation intention. It generates a test including both test prefix and oracle, which aims to be executable and semantically correct. We evaluate IntentionTest against state-of-the-art baselines on 4,146 test cases from 13 open-source projects. Specifically, compared to ChatTester, IntentionTest can (1) generate significantly more semantically correct tests, improving common mutation scores by 39.03% and coverage overlap with ground-truth tests by 40.14%; (2) generate 21.30% more successful passing tests.
Related papers
- Automated Test Generation from Program Documentation Encoded in Code Comments [4.696083734269232]
This paper introduces a novel test generation technique that exploits the code-comment documentation constructively.<n>We deliver test cases with names and oracles properly contextualized on the target behaviors.
arXiv Detail & Related papers (2025-04-29T20:23:56Z) - Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure.<n>We investigated 207 versions of 6 open-source projects.<n>Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z) - CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.<n>The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms [48.43237545197775]
Unit test generation has become a promising and important use case of LLMs.<n>ProjectTest is a project-level benchmark for unit test generation covering Python, Java, and JavaScript.
arXiv Detail & Related papers (2025-02-10T15:24:30Z) - TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.<n>It covers initial tests authoring, test suite completion, and code coverage improvements.<n>We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases.
We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z) - Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution.
TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults.
Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z) - Automatic Generation of Test Cases based on Bug Reports: a Feasibility
Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications.
Most testing procedures still rely on test cases written by humans to form test suites.
We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z) - Learning Deep Semantics for Test Completion [46.842174440120196]
We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test.
We develop TeCo -- a deep learning model using code semantics for test completion.
arXiv Detail & Related papers (2023-02-20T18:53:56Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.