PROZE: Generating Parameterized Unit Tests Informed by Runtime Data
- URL: http://arxiv.org/abs/2407.00768v2
- Date: Tue, 3 Sep 2024 12:24:32 GMT
- Title: PROZE: Generating Parameterized Unit Tests Informed by Runtime Data
- Authors: Deepika Tiwari, Yogya Gamage, Martin Monperrus, Benoit Baudry,
- Abstract summary: A parameterized unit test (PUT) receives a set of inputs as arguments and contains assertions that are expected to hold true for all these inputs.
In this paper, we address the problem of finding oracles for PUTs that hold over multiple inputs.
We design a system called PROZE, that generates PUTs by identifying developer-written assertions that are valid for more than one test input.
- Score: 10.405775369526006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Typically, a conventional unit test (CUT) verifies the expected behavior of the unit under test through one specific input / output pair. In contrast, a parameterized unit test (PUT) receives a set of inputs as arguments, and contains assertions that are expected to hold true for all these inputs. PUTs increase test quality, as they assess correctness on a broad scope of inputs and behaviors. However, defining assertions over a set of inputs is a hard task for developers, which limits the adoption of PUTs in practice. In this paper, we address the problem of finding oracles for PUTs that hold over multiple inputs. We design a system called PROZE, that generates PUTs by identifying developer-written assertions that are valid for more than one test input. We implement our approach as a two-step methodology: first, at runtime, we collect inputs for a target method that is invoked within a CUT; next, we isolate the valid assertions of the CUT to be used within a PUT. We evaluate our approach against 5 real-world Java modules, and collect valid inputs for 128 target methods from test and field executions. We generate 2,287 PUTs, which invoke the target methods with a significantly larger number of test inputs than the original CUTs. We execute the PUTs and find 217 that provably demonstrate that their oracles hold for a larger range of inputs than envisioned by the developers. From a testing theory perspective, our results show that developers express assertions within CUTs that are general enough to hold beyond one particular input.
Related papers
- Learning to Generate Unit Tests for Automated Debugging [52.63217175637201]
Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to a large language model (LLM)
We propose UTGen, which teaches LLMs to generate unit test inputs that reveal errors along with their correct expected outputs.
We show that UTGen outperforms UT generation baselines by 7.59% based on a metric measuring the presence of both error-revealing UT inputs and correct UT outputs.
arXiv Detail & Related papers (2025-02-03T18:51:43Z) - Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.
Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.
Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z) - LLM-Powered Test Case Generation for Detecting Tricky Bugs [30.82169191775785]
AID generates test inputs and oracles targeting plausibly correct programs.
We evaluate AID on two large-scale datasets with tricky bugs: TrickyBugs and EvalPlus.
The evaluation results show that the recall, precision, and F1 score of AID outperform the state-of-the-art by up to 1.80x, 2.65x, and 1.66x, respectively.
arXiv Detail & Related papers (2024-04-16T06:20:06Z) - Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties [3.3305233186101226]
This paper proposes Large Language Models (LLMs) to generate test programs.
We take a first glance at how pre-trained LLMs perform in test program generation to optimize non-functional properties of the DUT.
arXiv Detail & Related papers (2024-03-15T08:01:02Z) - Revisiting and Improving Retrieval-Augmented Deep Assertion Generation [13.373681113601982]
Unit testing has become an essential activity in software development process.
Yu et al. proposed an integrated approach (integration for short) to generate assertions for a unit test.
Despite promising, there is still a knowledge gap as to why or where integration works or does not work.
arXiv Detail & Related papers (2023-09-19T02:39:02Z) - Exploring Demonstration Ensembling for In-context Learning [75.35436025709049]
In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task.
The standard approach for ICL is to prompt the LMd demonstrations followed by the test input.
In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation.
arXiv Detail & Related papers (2023-08-17T04:45:19Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - Pre-trained Embeddings for Entity Resolution: An Experimental Analysis
[Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets.
First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors.
Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method.
Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z) - Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability.
We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z) - Generating Accurate Assert Statements for Unit Test Cases using
Pretrained Transformers [10.846226514357866]
Unit testing represents the foundational basis of the software testing pyramid.
We present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
arXiv Detail & Related papers (2020-09-11T19:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.