TDD Without Tears: Towards Test Case Generation from Requirements
through Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2401.07576v1
- Date: Mon, 15 Jan 2024 10:21:58 GMT
- Title: TDD Without Tears: Towards Test Case Generation from Requirements
through Deep Reinforcement Learning
- Authors: Wannita Takerngsaksiri, Rujikorn Charakorn, Chakkrit Tantithamthavorn,
Yuan-Fang Li
- Abstract summary: Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
- Score: 22.331330777536046
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Test-driven development (TDD) is a widely-employed software development
practice that mandates writing test cases based on requirements before writing
the actual code. While writing test cases is the centerpiece of TDD, it is
time-consuming, expensive, and often shunned by developers. To address these
issues associated with TDD, automated test case generation approaches have
recently been investigated. Such approaches take source code as input, but not
the requirements. Therefore, existing work does not fully support true TDD, as
actual code is required to generate test cases. In addition, current deep
learning-based test case generation approaches are trained with one learning
objective, i.e., to generate test cases that are exactly matched with the
ground-truth test cases. However, such approaches may limit the model's ability
to generate different yet correct test cases. In this paper, we introduce
PyTester, a Text-to-Testcase generation approach that can automatically
generate syntactically correct, executable, complete, and effective test cases
while being aligned with a given natural language requirement. We evaluate
PyTester on the public APPS benchmark dataset, and the results show that our
Deep RL approach enables PyTester, a small language model, to outperform much
larger language models like GPT3.5, StarCoder, and InCoder. Our findings
suggest that future research could consider improving small over large LMs for
better resource efficiency by integrating the SE domain knowledge into the
design of reinforcement learning architecture.
Related papers
- CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom LLMs to generate realistic test inputs.
LlamaRestTest surpasses state-of-the-art tools in code coverage and error detection, even with RESTGPT-enhanced specifications.
arXiv Detail & Related papers (2025-01-15T05:51:20Z) - TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved? [11.762669773233474]
Test-driven development (TDD) is the practice of writing tests first and coding later.
This paper introduces TDD-Bench Verified, a high-quality benchmark suite of 449 issues mined from real-world GitHub code repositories.
arXiv Detail & Related papers (2024-12-03T22:38:05Z) - ASTER: Natural and Multi-language Unit Test Generation with LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases.
We conduct an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness.
arXiv Detail & Related papers (2024-09-04T21:46:18Z) - KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs.
Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - CAT-LM: Training Language Models on Aligned Code And Tests [19.526181671936243]
Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected.
We propose the Aligned Code And Tests Language Model (CAT-LM), a GPT-style language model with 2.7 Billion parameters, trained on a corpus of Python and Java projects.
arXiv Detail & Related papers (2023-10-02T19:52:22Z) - AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information.
Test-time adaptive (TTA) methods are proposed to address this issue.
In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Learning Deep Semantics for Test Completion [46.842174440120196]
We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test.
We develop TeCo -- a deep learning model using code semantics for test completion.
arXiv Detail & Related papers (2023-02-20T18:53:56Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.