REFINESTAT: Efficient Exploration for Probabilistic Program Synthesis
- URL: http://arxiv.org/abs/2509.01082v1
- Date: Mon, 01 Sep 2025 03:13:36 GMT
- Title: REFINESTAT: Efficient Exploration for Probabilistic Program Synthesis
- Authors: Madhav Kanda, Shubham Ugare, Sasa Misailovic,
- Abstract summary: RefineStat enforces semantic constraints ensuring synthesized programs contain valid and well-formed parameters.<n>It applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail.<n>It produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models.
- Score: 5.509795962249259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific constraints. When small language models are tasked with generating probabilistic programs, they frequently produce outputs that suffer from both syntactic and semantic errors, such as flawed inference constructs. Motivated by probabilistic programmers' domain expertise and debugging strategies, we introduce RefineStat, a language model--driven framework that enforces semantic constraints ensuring synthesized programs contain valid distributions and well-formed parameters, and then applies diagnostic-aware refinement by resampling prior or likelihood components whenever reliability checks fail. We evaluate RefineStat on multiple probabilistic-programming code-generation tasks using smaller language models (SLMs) and find that it produces programs that are both syntactically sound and statistically reliable, often matching or surpassing those from closed-source large language models (e.g., OpenAI o3).
Related papers
- Quantitative Verification of Omega-regular Properties in Probabilistic Programming [12.097479337434878]
We introduce temporal posterior inference, a new framework that unifies probabilistic programming with temporal logic.<n>We develop a new method for computing upper and lower bounds on the satisfaction of probabilities of omega-regular properties.<n>We implement our approach in a prototype tool, TPInfer, and evaluate it on a suite of benchmarks.
arXiv Detail & Related papers (2025-12-25T09:26:29Z) - Code Vulnerability Detection Across Different Programming Languages with AI Models [0.0]
This paper presents the implementations of transformer-based models like CodeBERT and CodeLlama.<n>It shows how off-the-shelf models can successfully produce predictive capacity in models through dynamic fine-tuning of the models on vulnerable and safe code fragments.<n>Experiments show that a well-trained CodeBERT can be as good as or even better than some existing static analyzers in terms of accuracy greater than 97%.
arXiv Detail & Related papers (2025-08-14T05:41:58Z) - Accelerated Test-Time Scaling with Model-Free Speculative Sampling [58.69141724095398]
We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach.<n>We show that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding.<n>As a model-free approach, STAND can be applied to any existing language model without additional training.
arXiv Detail & Related papers (2025-06-05T07:31:18Z) - Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.<n>We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)<n>Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Robots That Ask For Help: Uncertainty Alignment for Large Language Model
Planners [85.03486419424647]
KnowNo is a framework for measuring and aligning the uncertainty of large language models.
KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
arXiv Detail & Related papers (2023-07-04T21:25:12Z) - R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility
Across Random User Intents [14.455036827804541]
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output.
We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE)
R-U-SURE is an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility.
arXiv Detail & Related papers (2023-03-01T18:46:40Z) - On the Reliability and Explainability of Language Models for Program
Generation [15.569926313298337]
We study the capabilities and limitations of automated program generation approaches.
We employ advanced explainable AI approaches to highlight the tokens that significantly contribute to the code transformation.
Our analysis reveals that, in various experimental scenarios, language models can recognize code grammar and structural information, but they exhibit limited robustness to changes in input sequences.
arXiv Detail & Related papers (2023-02-19T14:59:52Z) - Training and Inference on Any-Order Autoregressive Models the Right Way [97.39464776373902]
A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks.
We identify significant improvements to be made to previous formulations of AO-ARMs.
Our method leads to improved performance with no compromises on tractability.
arXiv Detail & Related papers (2022-05-26T18:00:02Z) - Transforming Probabilistic Programs for Model Checking [0.0]
We apply static analysis to probabilistic programs to automate large parts of two crucial model checking methods.
Our method transforms a probabilistic program specifying a density function into an efficient forward-sampling form.
We present an implementation targeting the popular Stan probabilistic programming language.
arXiv Detail & Related papers (2020-08-21T21:06:34Z) - Generating Random Logic Programs Using Constraint Programming [12.47276164048813]
We present a novel approach to generating random logic programs using constraint programming.
We show how the model scales with parameter values, and use the model to compare probabilistic inference algorithms across a range of synthetic problems.
arXiv Detail & Related papers (2020-06-02T19:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.