ProbTest: Unit Testing for Probabilistic Programs (Extended Version)
- URL: http://arxiv.org/abs/2509.02012v1
- Date: Tue, 02 Sep 2025 06:59:32 GMT
- Title: ProbTest: Unit Testing for Probabilistic Programs (Extended Version)
- Authors: Katrine Christensen, Mahsa Varshosaz, Raúl Pardo,
- Abstract summary: This work proposes a novel black-box unit testing method, ProbTest, for testing the outcomes of probabilistic programs.<n>We implement a plug-in for PyTest, a well-known unit testing tool for python programs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Testing probabilistic programs is non-trivial due to their stochastic nature. Given an input, the program may produce different outcomes depending on the underlying stochastic choices in the program. This means testing the expected outcomes of probabilistic programs requires repeated test executions unlike deterministic programs where a single execution may suffice for each test input. This raises the following question: how many times should we run a probabilistic program to effectively test it? This work proposes a novel black-box unit testing method, ProbTest, for testing the outcomes of probabilistic programs. Our method is founded on the theory surrounding a well-known combinatorial problem, the coupon collector's problem. Using this method, developers can write unit tests as usual without extra effort while the number of required test executions is determined automatically with statistical guarantees for the results. We implement ProbTest as a plug-in for PyTest, a well-known unit testing tool for python programs. Using this plug-in, developers can write unit tests similar to any other Python program and the necessary test executions are handled automatically. We evaluate the method on case studies from the Gymnasium reinforcement learning library and a randomized data structure.
Related papers
- Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces [3.7752830020595796]
We curate a dataset (Codehacks) of programming problems together with corresponding error-inducing test cases.<n>The dataset comprises 288,617 hacks for 5,578 programming problems.<n>The source code for 2,196 submitted solutions to these problems can be broken with their corresponding hacks.
arXiv Detail & Related papers (2025-03-30T14:50:03Z) - Bounding Random Test Set Size with Computational Learning Theory [0.2999888908665658]
We show how probabilistic approaches to answer this question in Machine Learning can be applied in our testing context.
We are the first to enable this from only knowing the number of coverage targets in the source code.
We validate this bound on a large set of Java units, and an autonomous driving system.
arXiv Detail & Related papers (2024-05-27T10:15:16Z) - Fine-Grained Assertion-Based Test Selection [6.9290255098776425]
Regression test selection techniques aim at reducing test execution time by selecting only the tests that are affected by code changes.<n>Existing techniques select test entities at coarse granularity levels such as test class, which causes imprecise test selection and executing unaffected tests.<n>We propose a novel approach that increases the selection precision by analyzing test code at statement level and treating test assertions as the unit for selection.
arXiv Detail & Related papers (2024-03-24T04:07:30Z) - FlaPy: Mining Flaky Python Tests at Scale [14.609208863749831]
FlaPy is a framework for researchers to mine flaky tests in a given or automatically sampled set of Python projects by rerunning their test suites.
FlaPy isolates the test executions using containerization and fresh execution environments to simulate real-world CI conditions.
FlaPy supports parallelizing the test executions using SLURM, making it feasible to scan thousands of projects for test flakiness.
arXiv Detail & Related papers (2023-05-08T15:48:57Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Exact Paired-Permutation Testing for Structured Test Statistics [67.71280539312536]
We provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics.
Our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
arXiv Detail & Related papers (2022-05-03T11:00:59Z) - ProbNum: Probabilistic Numerics in Python [62.52335490524408]
Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference.
We present ProbNum: a Python library providing state-of-the-art PNMs.
arXiv Detail & Related papers (2021-12-03T07:20:50Z) - Automated Support for Unit Test Generation: A Tutorial Book Chapter [21.716667622896193]
Unit testing is a stage of testing where the smallest segment of code that can be tested in isolation from the rest of the system is tested.
Unit tests are typically written as executable code, often in a format provided by a unit testing framework such as pytest for Python.
This chapter introduces the concept of search-based unit test generation.
arXiv Detail & Related papers (2021-10-26T11:13:40Z) - Group Testing with Non-identical Infection Probabilities [59.96266198512243]
We develop an adaptive group testing algorithm using the set formation method.
We show that our algorithm outperforms the state of the art, and performs close to the entropy lower bound.
arXiv Detail & Related papers (2021-08-27T17:53:25Z) - Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse
Experts with Self-Supervision [85.07855130048951]
We study a more practical task setting, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed.
We propose a new method, called Test-time Aggregating Diverse Experts (TADE), that trains diverse experts to excel at handling different test distributions.
We theoretically show that our method has provable ability to simulate unknown test class distributions.
arXiv Detail & Related papers (2021-07-20T04:10:31Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.