Enhancing Large Language Models for Text-to-Testcase Generation
- URL: http://arxiv.org/abs/2402.11910v1
- Date: Mon, 19 Feb 2024 07:50:54 GMT
- Title: Enhancing Large Language Models for Text-to-Testcase Generation
- Authors: Saranya Alagarsamy, Chakkrit Tantithamthavorn, Chetan Arora, Aldeida
Aleti
- Abstract summary: We introduce a text-to-testcase generation approach based on a large language model (GPT-3.5)
We evaluate the effectiveness of our approach using a span of five large-scale open-source software projects.
- Score: 12.864685900686158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: Test-driven development (TDD) is a widely employed software
development practice that involves developing test cases based on requirements
prior to writing the code. Although various methods for automated test case
generation have been proposed, they are not specifically tailored for TDD,
where requirements instead of code serve as input. Objective: In this paper, we
introduce a text-to-testcase generation approach based on a large language
model (GPT-3.5) that is fine-tuned on our curated dataset with an effective
prompt design. Method: Our approach involves enhancing the capabilities of
basic GPT-3.5 for text-to-testcase generation task that is fine-tuned on our
curated dataset with an effective prompting design. We evaluated the
effectiveness of our approach using a span of five large-scale open-source
software projects. Results: Our approach generated 7k test cases for open
source projects, achieving 78.5% syntactic correctness, 67.09% requirement
alignment, and 61.7% code coverage, which substantially outperforms all other
LLMs (basic GPT-3.5, Bloom, and CodeT5). In addition, our ablation study
demonstrates the substantial performance improvement of the fine-tuning and
prompting components of the GPT-3.5 model. Conclusions: These findings lead us
to conclude that fine-tuning and prompting should be considered in the future
when building a language model for the text-to-testcase generation task
Related papers
- Advancing LLM Reasoning Generalists with Preference Trees [119.57169648859707]
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks.
arXiv Detail & Related papers (2024-04-02T16:25:30Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - TDD Without Tears: Towards Test Case Generation from Requirements
through Deep Reinforcement Learning [22.331330777536046]
Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
arXiv Detail & Related papers (2024-01-15T10:21:58Z) - Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users [0.0]
In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database.
In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021.
We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline set of outputs, a fine-tuned model outperforms GPT 3.5 Turbo.
arXiv Detail & Related papers (2023-11-10T07:13:06Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - Domain Adaptation for Deep Unit Test Case Generation [7.80803046080817]
We leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level.
We compare our approach with (a) CodeT5 fine-tuned on the test generation task without DA, (b) the A3Test tool, and (c) GPT-4, on 5 projects from the Defects4j dataset.
The results show that using DA can increase the line coverage of the generated tests on average 18.62%, 19.88%, and 18.02%.
arXiv Detail & Related papers (2023-08-15T20:48:50Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Learning Deep Semantics for Test Completion [46.842174440120196]
We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test.
We develop TeCo -- a deep learning model using code semantics for test completion.
arXiv Detail & Related papers (2023-02-20T18:53:56Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.