Related papers: PROZE: Generating Parameterized Unit Tests Informed by Runtime Data

PROZE: Generating Parameterized Unit Tests Informed by Runtime Data

URL: http://arxiv.org/abs/2407.00768v2
Date: Tue, 3 Sep 2024 12:24:32 GMT
Title: PROZE: Generating Parameterized Unit Tests Informed by Runtime Data
Authors: Deepika Tiwari, Yogya Gamage, Martin Monperrus, Benoit Baudry,
Abstract summary: A parameterized unit test (PUT) receives a set of inputs as arguments and contains assertions that are expected to hold true for all these inputs. In this paper, we address the problem of finding oracles for PUTs that hold over multiple inputs. We design a system called PROZE, that generates PUTs by identifying developer-written assertions that are valid for more than one test input.
Score: 10.405775369526006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Typically, a conventional unit test (CUT) verifies the expected behavior of the unit under test through one specific input / output pair. In contrast, a parameterized unit test (PUT) receives a set of inputs as arguments, and contains assertions that are expected to hold true for all these inputs. PUTs increase test quality, as they assess correctness on a broad scope of inputs and behaviors. However, defining assertions over a set of inputs is a hard task for developers, which limits the adoption of PUTs in practice. In this paper, we address the problem of finding oracles for PUTs that hold over multiple inputs. We design a system called PROZE, that generates PUTs by identifying developer-written assertions that are valid for more than one test input. We implement our approach as a two-step methodology: first, at runtime, we collect inputs for a target method that is invoked within a CUT; next, we isolate the valid assertions of the CUT to be used within a PUT. We evaluate our approach against 5 real-world Java modules, and collect valid inputs for 128 target methods from test and field executions. We generate 2,287 PUTs, which invoke the target methods with a significantly larger number of test inputs than the original CUTs. We execute the PUTs and find 217 that provably demonstrate that their oracles hold for a larger range of inputs than envisioned by the developers. From a testing theory perspective, our results show that developers express assertions within CUTs that are general enough to hold beyond one particular input.

Related papers

Understanding and Characterizing Mock Assertions in Unit Tests [12.96550571237691]
Despite their significance, mock assertions are rarely considered by automated test generation techniques. Our analysis of 4,652 test cases from 11 popular Java projects reveals that mock assertions are mostly applied to validating specific kinds of method calls. We find that mock assertions complement traditional test assertions by ensuring the desired side effects have been produced.
arXiv Detail & Related papers (2025-03-25T02:35:05Z)
Learning to Generate Unit Tests for Automated Debugging [52.63217175637201]
Unit tests (UTs) play an instrumental role in assessing code correctness as well as providing feedback to large language models (LLMs) We propose UTGen, which teaches LLMs to generate unit test inputs that reveal errors along with their correct expected outputs. We show that UTGen outperforms other LLM-based baselines by 7.59% based on a metric measuring the presence of both error-revealing UT inputs and correct UT outputs.
arXiv Detail & Related papers (2025-02-03T18:51:43Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
LLM-Powered Test Case Generation for Detecting Tricky Bugs [30.82169191775785]
AID generates test inputs and oracles targeting plausibly correct programs. We evaluate AID on two large-scale datasets with tricky bugs: TrickyBugs and EvalPlus. The evaluation results show that the recall, precision, and F1 score of AID outperform the state-of-the-art by up to 1.80x, 2.65x, and 1.66x, respectively.
arXiv Detail & Related papers (2024-04-16T06:20:06Z)
Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties [3.3305233186101226]
This paper proposes Large Language Models (LLMs) to generate test programs. We take a first glance at how pre-trained LLMs perform in test program generation to optimize non-functional properties of the DUT.
arXiv Detail & Related papers (2024-03-15T08:01:02Z)
Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z)
Revisiting and Improving Retrieval-Augmented Deep Assertion Generation [13.373681113601982]
Unit testing has become an essential activity in software development process. Yu et al. proposed an integrated approach (integration for short) to generate assertions for a unit test. Despite promising, there is still a knowledge gap as to why or where integration works or does not work.
arXiv Detail & Related papers (2023-09-19T02:39:02Z)
Exploring Demonstration Ensembling for In-context Learning [75.35436025709049]
In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task. The standard approach for ICL is to prompt the LMd demonstrations followed by the test input. In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation.
arXiv Detail & Related papers (2023-08-17T04:45:19Z)
Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD) We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector. Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z)
AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information. Test-time adaptive (TTA) methods are proposed to address this issue. In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z)
Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment, Analysis & Benchmark] [65.11858854040544]
We perform a thorough experimental analysis of 12 popular language models over 17 established benchmark datasets. First, we assess their vectorization overhead for converting all input entities into dense embeddings vectors. Second, we investigate their blocking performance, performing a detailed scalability analysis, and comparing them with the state-of-the-art deep learning-based blocking method. Third, we conclude with their relative performance for both supervised and unsupervised matching.
arXiv Detail & Related papers (2023-04-24T08:53:54Z)
Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability. We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z)
Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers [10.846226514357866]
Unit testing represents the foundational basis of the software testing pyramid. We present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
arXiv Detail & Related papers (2020-09-11T19:35:09Z)
Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases. We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java. We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.