Tuning Random Generators: Property-Based Testing as Probabilistic Programming
- URL: http://arxiv.org/abs/2508.14394v1
- Date: Wed, 20 Aug 2025 03:45:13 GMT
- Title: Tuning Random Generators: Property-Based Testing as Probabilistic Programming
- Authors: Ryan Tjoa, Poorva Garg, Harrison Goldstein, Todd Millstein, Benjamin Pierce, Guy Van den Broeck,
- Abstract summary: Property-based testing (PBT) validates software against an executable specification by evaluating it on randomly generated inputs.<n>The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices.<n>We develop techniques for the automatic and offline tuning of generators.
- Score: 19.843056237039516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Property-based testing validates software against an executable specification by evaluating it on randomly generated inputs. The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices. To achieve a good distribution over test inputs, users must tune their generators, i.e., decide on the weights of these individual random choices. Unfortunately, it is very difficult to understand how to choose individual generator weights in order to achieve a desired distribution, so today this process is tedious and limits the distributions that can be practically achieved. In this paper, we develop techniques for the automatic and offline tuning of generators. Given a generator with undetermined symbolic weights and an objective function, our approach automatically learns values for these weights that optimize for the objective. We describe useful objective functions that allow users to (1) target desired distributions and (2) improve the diversity and validity of their test cases. We have implemented our approach in a novel discrete probabilistic programming system, Loaded Dice, that supports differentiation and parameter learning, and use it as a language for generators. We empirically demonstrate that our approach is effective at optimizing generator distributions according to the specified objective functions. We also perform a thorough evaluation on PBT benchmarks, demonstrating that, when automatically tuned for diversity and validity, the generators exhibit a 3.1-7.4x speedup in bug finding.
Related papers
- $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners [69.66089681814013]
$V_$ is a framework that unifies generation and verification through efficient pairwise ranking.<n>$V_$-Infer improves Pass@1 by up to $10%$ over pointwise verification.<n>$V_$-PairRL achieves $7$--$9%$ test-time scaling gains over standard RL and pointwise joint training.
arXiv Detail & Related papers (2026-03-04T17:22:16Z) - Scaling Agentic Verifier for Competitive Coding [66.11758166379092]
Large language models (LLMs) have demonstrated strong coding capabilities but still struggle to solve competitive programming problems correctly in a single attempt.<n>Execution-based re-ranking offers a promising test-time scaling strategy, yet existing methods are constrained by either difficult test case generation or inefficient random input sampling.<n>We propose Agentic Verifier, an execution-based agent that actively reasons about program behaviors and searches for highly discriminative test inputs.
arXiv Detail & Related papers (2026-02-04T06:30:40Z) - How to Select Datapoints for Efficient Human Evaluation of NLG Models? [57.60407340254572]
We develop and analyze a suite of selectors to get the most informative datapoints for human evaluation.<n>We show that selectors based on variance in automated metric scores, diversity in model outputs, or Item Response Theory outperform random selection.<n>In particular, we introduce source-based estimators, which predict item usefulness for human evaluation just based on the source texts.
arXiv Detail & Related papers (2025-01-30T10:33:26Z) - Learning test generators for cyber-physical systems [2.4171019220503402]
Black-box runtime verification methods for cyber-physical systems can be used to discover errors in systems whose inputs and outputs are expressed as signals over time.
Existing methods, such as requirement falsification, often focus on finding a single input that is a counterexample to system correctness.
We show how to create test generators that can produce multiple and diverse counterexamples for a single requirement.
arXiv Detail & Related papers (2024-10-04T07:34:02Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - Insights into Closed-form IPM-GAN Discriminator Guidance for Diffusion Modeling [11.68361062474064]
We propose a theoretical framework to analyze the effect of the GAN discriminator on Langevin-based sampling.<n>We show that the proposed approach can be combined with existing accelerated-diffusion techniques to improve latent-space image generation.
arXiv Detail & Related papers (2023-06-02T16:24:07Z) - A Robust Classifier Under Missing-Not-At-Random Sample Selection Bias [15.628927478079913]
In statistics, Greene's method formulates this type of sample selection with logistic regression as the prediction model.
We propose BiasCorr, an algorithm that improves on Greene's method by modifying the original training set.
We provide theoretical guarantee for the improvement of BiasCorr over Greene's method by analyzing its bias.
arXiv Detail & Related papers (2023-05-25T01:39:51Z) - Learning Probabilistic Models from Generator Latent Spaces with Hat EBM [81.35199221254763]
This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM)
Experiments show strong performance of the proposed method on (1) unconditional ImageNet synthesis at 128x128 resolution, (2) refining the output of existing generators, and (3) learning EBMs that incorporate non-probabilistic generators.
arXiv Detail & Related papers (2022-10-29T03:55:34Z) - Mode Penalty Generative Adversarial Network with adapted Auto-encoder [0.15229257192293197]
We propose a mode penalty GAN combined with pre-trained auto encoder for explicit representation of generated and real data samples in encoded space.
We demonstrate that applying the proposed method to GANs helps generator's optimization becoming more stable and having faster convergence through experimental evaluations.
arXiv Detail & Related papers (2020-11-16T03:39:53Z) - Sampling-Decomposable Generative Adversarial Recommender [84.05894139540048]
We propose a Sampling-Decomposable Generative Adversarial Recommender (SD-GAR)
In the framework, the divergence between some generator and the optimum is compensated by self-normalized importance sampling.
We extensively evaluate the proposed algorithm with five real-world recommendation datasets.
arXiv Detail & Related papers (2020-11-02T13:19:10Z) - Uncertainty Inspired RGB-D Saliency Detection [70.50583438784571]
We propose the first framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection.
Results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps.
arXiv Detail & Related papers (2020-09-07T13:01:45Z) - A Search for Good Pseudo-random Number Generators : Survey and Empirical Studies [0.0]
The genre of PRNGs developed so far are explored and classified into three groups -- linear congruential generator based, linear feedback shift register based and cellular automata based.<n>Overall $30$ PRNGs are selected in this way on which two types of empirical testing are done -- blind statistical tests with Diehard battery of tests, battery emphrabbit of TestU01 library and NIST statistical test-suite.
arXiv Detail & Related papers (2018-11-03T07:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.