Domain Adaptation for Deep Unit Test Case Generation
- URL: http://arxiv.org/abs/2308.08033v2
- Date: Fri, 19 Jan 2024 15:58:34 GMT
- Title: Domain Adaptation for Deep Unit Test Case Generation
- Authors: Jiho Shin, Sepehr Hashtroudi, Hadi Hemmati, Song Wang
- Abstract summary: We leverage Transformer-based code models to generate unit tests with the help of Domain Adaptation (DA) at a project level.
We compare our approach with (a) CodeT5 fine-tuned on the test generation task without DA, (b) the A3Test tool, and (c) GPT-4, on 5 projects from the Defects4j dataset.
The results show that using DA can increase the line coverage of the generated tests on average 18.62%, 19.88%, and 18.02%.
- Score: 7.80803046080817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep learning-based test case generation approaches have been
proposed to automate the generation of unit test cases. In this study, we
leverage Transformer-based code models to generate unit tests with the help of
Domain Adaptation (DA) at a project level. Specifically, we use CodeT5, which
is a relatively small language model trained on source code data, and fine-tune
it on the test generation task; then again further fine-tune it on each target
project data to learn the project-specific knowledge (project-level DA). We use
the Methods2test dataset to fine-tune CodeT5 for the test generation task and
the Defects4j dataset for project-level domain adaptation and evaluation. We
compare our approach with (a) CodeT5 fine-tuned on the test generation without
DA, (b) the A3Test tool, and (c) GPT-4, on 5 projects from the Defects4j
dataset. The results show that using DA can increase the line coverage of the
generated tests on average 18.62%, 19.88%, and 18.02% compared to the above
(a), (b), and (c) baselines, respectively. The results also consistently show
improvements using other metrics such as BLEU and CodeBLEU. In addition, we
show that our approach can be seen as a complementary solution alongside
existing search-based test generation tools such as EvoSuite, to increase the
overall coverage and mutation scores with an average of 34.42% and 6.8%, for
line coverage and mutation score, respectively.
Related papers
- Constrained C-Test Generation via Mixed-Integer Programming [55.28927994487036]
This work proposes a novel method to generate C-Tests; a form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap.
In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach.
We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap responses) under an open source license.
arXiv Detail & Related papers (2024-04-12T21:35:21Z) - Enhancing Large Language Models for Text-to-Testcase Generation [12.864685900686158]
We introduce a text-to-testcase generation approach based on a large language model (GPT-3.5)
We evaluate the effectiveness of our approach using a span of five large-scale open-source software projects.
arXiv Detail & Related papers (2024-02-19T07:50:54Z) - Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation.
SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2.
Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z) - TDD Without Tears: Towards Test Case Generation from Requirements
through Deep Reinforcement Learning [22.331330777536046]
Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
arXiv Detail & Related papers (2024-01-15T10:21:58Z) - Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels.
Unclear validation protocol for DA has led to bad practices in the literature.
We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z) - AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information.
Test-time adaptive (TTA) methods are proposed to address this issue.
In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z) - An Empirical Evaluation of Using Large Language Models for Automated
Unit Test Generation [3.9762912548964864]
This paper presents a large-scale empirical evaluation on the effectiveness of Large Language Models for automated unit test generation.
We implement our approach in TestPilot, a test generation tool for JavaScript that automatically generates unit tests for all API functions in an npm package.
We find that 92.8% of TestPilot's generated tests have no more than 50% similarity with existing tests.
arXiv Detail & Related papers (2023-02-13T17:13:41Z) - TeST: Test-time Self-Training under Distribution Shift [99.68465267994783]
Test-Time Self-Training (TeST) is a technique that takes as input a model trained on some source data and a novel data distribution at test time.
We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms.
arXiv Detail & Related papers (2022-09-23T07:47:33Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.