Related papers: Unit Test Case Generation with Transformers and Focal Context

Unit Test Case Generation with Transformers and Focal Context

URL: http://arxiv.org/abs/2009.05617v2
Date: Thu, 20 May 2021 23:11:01 GMT
Title: Unit Test Case Generation with Transformers and Focal Context
Authors: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, Neel Sundaresan
Abstract summary: AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases. We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java. We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
Score: 10.220204860586582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated unit test case generation tools facilitate test-driven development and support developers by suggesting tests intended to identify flaws in their code. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult for developers to read or understand. In this paper we propose AthenaTest, an approach that aims to generate unit test cases by learning from real-world focal methods and developer-written testcases. We formulate unit test case generation as a sequence-to-sequence learning task, adopting a two-step training procedure consisting of denoising pretraining on a large unsupervised Java corpus, and supervised finetuning for a downstream translation task of generating unit tests. We investigate the impact of natural language and source code pretraining, as well as the focal context information surrounding the focal method. Both techniques provide improvements in terms of validation loss, with pretraining yielding 25% relative improvement and focal context providing additional 11.1% improvement. We also introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 780K test cases mined from 91K open-source repositories from GitHub. We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts. We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3, finding that our approach outperforms GPT-3 and has comparable coverage w.r.t. EvoSuite. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated tests, showing overwhelmingly preference towards AthenaTest.

Related papers

Intention-Driven Generation of Project-Specific Test Cases [14.297390481640068]
We propose IntentionTest, which generates project-specific tests with validation intention as a structured description.<n>We evaluate IntentionTest against state-of-the-art baselines on 4,146 test cases from 13 open-source projects.
arXiv Detail & Related papers (2025-07-28T08:35:04Z)
Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure. We investigated 207 versions of 6 open-source projects. Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z)
TestForge: Feedback-Driven, Agentic Test Suite Generation [7.288137795439405]
TestForge is an agentic unit testing framework designed to cost-effectively generate high-quality test suites for real-world code. TestForge produces more natural and understandable tests compared to state-of-the-art search-based techniques.
arXiv Detail & Related papers (2025-03-18T20:21:44Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories. It covers initial tests authoring, test suite completion, and code coverage improvements. We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z)
Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases. We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z)
Improving LLM-based Unit test generation via Template-based Repair [8.22619177301814]
Unit test is crucial for detecting bugs in individual program units but consumes time and effort. Large language models (LLMs) have demonstrated remarkable reasoning and generation capabilities. In this paper, we propose TestART, a novel unit test generation method.
arXiv Detail & Related papers (2024-08-06T10:52:41Z)
CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation [5.450831103980871]
CasModaTest is a cascaded, model-agnostic, and end-to-end unit test generation framework. It generates test prefixes and test oracles and compiles or executes them to check their effectiveness.
arXiv Detail & Related papers (2024-06-22T05:52:39Z)
Constrained C-Test Generation via Mixed-Integer Programming [55.28927994487036]
This work proposes a novel method to generate C-Tests; a form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap responses) under an open source license.
arXiv Detail & Related papers (2024-04-12T21:35:21Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults. Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z)
Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles. The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z)
Learning Deep Semantics for Test Completion [46.842174440120196]
We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TeCo -- a deep learning model using code semantics for test completion.
arXiv Detail & Related papers (2023-02-20T18:53:56Z)
CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases. CodeT executes the code solutions using the generated test cases, and then chooses the best solution. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z)
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization. Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z)
Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers [10.846226514357866]
Unit testing represents the foundational basis of the software testing pyramid. We present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
arXiv Detail & Related papers (2020-09-11T19:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.