TDD Without Tears: Towards Test Case Generation from Requirements
  through Deep Reinforcement Learning
        - URL: http://arxiv.org/abs/2401.07576v1
- Date: Mon, 15 Jan 2024 10:21:58 GMT
- Title: TDD Without Tears: Towards Test Case Generation from Requirements
  through Deep Reinforcement Learning
- Authors: Wannita Takerngsaksiri, Rujikorn Charakorn, Chakkrit Tantithamthavorn,
  Yuan-Fang Li
- Abstract summary: Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
- Score: 22.331330777536046
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract:   Test-driven development (TDD) is a widely-employed software development
practice that mandates writing test cases based on requirements before writing
the actual code. While writing test cases is the centerpiece of TDD, it is
time-consuming, expensive, and often shunned by developers. To address these
issues associated with TDD, automated test case generation approaches have
recently been investigated. Such approaches take source code as input, but not
the requirements. Therefore, existing work does not fully support true TDD, as
actual code is required to generate test cases. In addition, current deep
learning-based test case generation approaches are trained with one learning
objective, i.e., to generate test cases that are exactly matched with the
ground-truth test cases. However, such approaches may limit the model's ability
to generate different yet correct test cases. In this paper, we introduce
PyTester, a Text-to-Testcase generation approach that can automatically
generate syntactically correct, executable, complete, and effective test cases
while being aligned with a given natural language requirement. We evaluate
PyTester on the public APPS benchmark dataset, and the results show that our
Deep RL approach enables PyTester, a small language model, to outperform much
larger language models like GPT3.5, StarCoder, and InCoder. Our findings
suggest that future research could consider improving small over large LMs for
better resource efficiency by integrating the SE domain knowledge into the
design of reinforcement learning architecture.
 
      
        Related papers
        - Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
 We introduce QAlign, a new test-time alignment approach.
As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt.
By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
 arXiv  Detail & Related papers  (2025-04-04T00:41:40Z)
- LLM-based Unit Test Generation for Dynamically-Typed Programs [16.38145000434927]
 TypeTest is a novel framework that enhances type correctness in test generation through a vector-based Retrieval-Augmented Generation system.
In an evaluation on 125 real-world Python modules, TypeTest achieved an average statement coverage of 86.6% and branch coverage of 76.8%, outperforming state-of-theart tools by 5.4% and 9.3%, respectively.
 arXiv  Detail & Related papers  (2025-03-18T08:07:17Z)
- CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context,   and Verification [71.34070740261072]
 This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
 arXiv  Detail & Related papers  (2025-02-12T21:42:56Z)
- LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
 We present LlamaRestTest, a novel approach that employs two custom Large Language Models (LLMs) to generate realistic test inputs.
We evaluate it against several state-of-the-art REST API testing tools, including RESTGPT, a GPT-powered specification-enhancement tool.
Our study shows that small language models can perform as well as, or better than, large language models in REST API testing.
 arXiv  Detail & Related papers  (2025-01-15T05:51:20Z)
- TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get   Resolved? [11.762669773233474]
 Test-driven development (TDD) is the practice of writing tests first and coding later.
This paper introduces TDD-Bench Verified, a high-quality benchmark suite of 449 issues mined from real-world GitHub code repositories.
 arXiv  Detail & Related papers  (2024-12-03T22:38:05Z)
- Multi-language Unit Test Generation using LLMs [6.259245181881262]
 We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases.
We show how the pipeline can be applied to different programming languages, specifically Java and Python, and to complex software requiring environment mocking.
Our results demonstrate that LLM-based test generation, when guided by static analysis, can be competitive with, and even outperform, state-of-the-art test-generation techniques in coverage achieved.
 arXiv  Detail & Related papers  (2024-09-04T21:46:18Z)
- KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
 KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs.
Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
 arXiv  Detail & Related papers  (2024-07-14T14:48:18Z)
- Test-Driven Development for Code Generation [0.850206009406913]
 Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
 arXiv  Detail & Related papers  (2024-02-21T04:10:12Z)
- Generative Judge for Evaluating Alignment [84.09815387884753]
 We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
 Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
 arXiv  Detail & Related papers  (2023-10-09T07:27:15Z)
- CAT-LM: Training Language Models on Aligned Code And Tests [19.526181671936243]
 Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected.
We propose the Aligned Code And Tests Language Model (CAT-LM), a GPT-style language model with 2.7 Billion parameters, trained on a corpus of Python and Java projects.
 arXiv  Detail & Related papers  (2023-10-02T19:52:22Z)
- AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
 Domain generalization can be arbitrarily hard without exploiting target domain information.
Test-time adaptive (TTA) methods are proposed to address this issue.
In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
 arXiv  Detail & Related papers  (2023-04-25T04:23:13Z)
- Teaching Large Language Models to Self-Debug [62.424077000154945]
 Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
 arXiv  Detail & Related papers  (2023-04-11T10:43:43Z)
- Learning Deep Semantics for Test Completion [46.842174440120196]
 We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test.
We develop TeCo -- a deep learning model using code semantics for test completion.
 arXiv  Detail & Related papers  (2023-02-20T18:53:56Z)
- TeST: Test-time Self-Training under Distribution Shift [99.68465267994783]
 Test-Time Self-Training (TeST) is a technique that takes as input a model trained on some source data and a novel data distribution at test time.
We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms.
 arXiv  Detail & Related papers  (2022-09-23T07:47:33Z)
- CodeT: Code Generation with Generated Tests [49.622590050797236]
 We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
 arXiv  Detail & Related papers  (2022-07-21T10:18:37Z)
- Generating Accurate Assert Statements for Unit Test Cases using
  Pretrained Transformers [10.846226514357866]
 Unit testing represents the foundational basis of the software testing pyramid.
We present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
 arXiv  Detail & Related papers  (2020-09-11T19:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.