Enhancing Large Language Models for Text-to-Testcase Generation
- URL: http://arxiv.org/abs/2402.11910v1
- Date: Mon, 19 Feb 2024 07:50:54 GMT
- Title: Enhancing Large Language Models for Text-to-Testcase Generation
- Authors: Saranya Alagarsamy, Chakkrit Tantithamthavorn, Chetan Arora, Aldeida
Aleti
- Abstract summary: We introduce a text-to-testcase generation approach based on a large language model (GPT-3.5)
We evaluate the effectiveness of our approach using a span of five large-scale open-source software projects.
- Score: 12.864685900686158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: Test-driven development (TDD) is a widely employed software
development practice that involves developing test cases based on requirements
prior to writing the code. Although various methods for automated test case
generation have been proposed, they are not specifically tailored for TDD,
where requirements instead of code serve as input. Objective: In this paper, we
introduce a text-to-testcase generation approach based on a large language
model (GPT-3.5) that is fine-tuned on our curated dataset with an effective
prompt design. Method: Our approach involves enhancing the capabilities of
basic GPT-3.5 for text-to-testcase generation task that is fine-tuned on our
curated dataset with an effective prompting design. We evaluated the
effectiveness of our approach using a span of five large-scale open-source
software projects. Results: Our approach generated 7k test cases for open
source projects, achieving 78.5% syntactic correctness, 67.09% requirement
alignment, and 61.7% code coverage, which substantially outperforms all other
LLMs (basic GPT-3.5, Bloom, and CodeT5). In addition, our ablation study
demonstrates the substantial performance improvement of the fine-tuning and
prompting components of the GPT-3.5 model. Conclusions: These findings lead us
to conclude that fine-tuning and prompting should be considered in the future
when building a language model for the text-to-testcase generation task
Related papers
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models [8.22619177301814]
We introduce TestBench, a benchmark for class-level LLM-based test case generation.
We construct a dataset of 108 Java programs from 9 real-world, large-scale projects on GitHub.
We propose a fine-grained evaluation framework that considers five aspects of test cases: syntactic correctness, compilation correctness, test correctness, code coverage rate, and defect detection rate.
arXiv Detail & Related papers (2024-09-26T06:18:06Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.
Existing direct preference learning algorithms are originally designed for the single-turn chat task.
We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - Advancing LLM Reasoning Generalists with Preference Trees [119.57169648859707]
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks.
arXiv Detail & Related papers (2024-04-02T16:25:30Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - PyTester: Deep Reinforcement Learning for Text-to-Testcase Generation [20.441921569948562]
Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
arXiv Detail & Related papers (2024-01-15T10:21:58Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - Domain Adaptation for Code Model-based Unit Test Case Generation [7.147408628963976]
We leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level.
We show that tests generated using DA can increase the line coverage by 18.62%, 19.88%, and 18.02% and mutation score by 16.45%, 16.01%, and 12.99%.
arXiv Detail & Related papers (2023-08-15T20:48:50Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.