Related papers: Learning Program Behavioral Models from Synthesized Input-Output Pairs

Learning Program Behavioral Models from Synthesized Input-Output Pairs

URL: http://arxiv.org/abs/2407.08597v2
Date: Mon, 17 Mar 2025 15:04:06 GMT
Title: Learning Program Behavioral Models from Synthesized Input-Output Pairs
Authors: Tural Mammadov, Dietrich Klakow, Alexander Koller, Andreas Zeller,
Abstract summary: We introduce Modelizer, a framework that learns a model from its input/output behavior using neural machine translation algorithms.<n>Modelizer mocks the original program, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications.<n>We foresee several applications of these models, especially as the output of the program can be any aspect of program behavior.
Score: 70.9524884086882
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce Modelizer - a novel framework that, given a black-box program, learns a model from its input/output behavior using neural machine translation algorithms. The resulting model mocks the original program: Given an input, the model predicts the output that would have been produced by the program. However, the model is also reversible - that is, the model can predict the input that would have produced a given output. Finally, the model is differentiable and can be efficiently restricted to predict only a certain aspect of the program behavior. Modelizer uses grammars to synthesize and inputs and unsupervised tokenizers to decompose the resulting outputs, allowing it to learn sequence-to-sequence associations between token streams. Other than input grammars, Modelizer only requires the ability to execute the program. The resulting models are small, requiring fewer than 6.3 million parameters for languages such as Markdown or HTML; and they are accurate, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. As it learns from and predicts executions rather than code, Modelizer departs from the LLM-centric research trend, opening new opportunities for program-specific models that are fully tuned towards individual programs. Indeed, we foresee several applications of these models, especially as the output of the program can be any aspect of program behavior. Beyond mocking and predicting program behavior, the models can also synthesize inputs that are likely to produce a particular behavior, such as failures or coverage, thus assisting in program understanding and maintenance.

Related papers

Knockout: A simple way to handle missing inputs [8.05324050767023]
Models that leverage rich inputs can be difficult to deploy widely because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions.
arXiv Detail & Related papers (2024-05-30T19:47:34Z)
Language models scale reliably with over-training and on downstream tasks [121.69867718185125]
Scaling laws are useful guides for derisking expensive training runs. However, there remain gaps between current studies and how language models are trained. In contrast, scaling laws mostly predict loss on inference, but models are usually compared on downstream task performance.
arXiv Detail & Related papers (2024-03-13T13:54:00Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
Multi-Model Probabilistic Programming [0.0]
We present an extension of probabilistic programming that lets each program represent a network of interrelated probabilistic models. We give a formal semantics for these multi-model probabilistic programs, a collection of efficient algorithms for network-of-model operations, and an example implementation built on top of the popular probabilistic programming language Stan. This network-of-models representation opens many doors, including search and automation in model-space, tracking and communication of model development, and explicit modeler degrees of freedom to mitigate issues like p-hacking.
arXiv Detail & Related papers (2022-08-12T15:38:15Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
Fast Model Editing at Scale [77.69220974621425]
We propose Model Editor Networks with Gradient Decomposition (MEND) MEND is a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models.
arXiv Detail & Related papers (2021-10-21T17:41:56Z)
Program Synthesis with Large Language Models [40.41120807053989]
We evaluate large language models for program synthesis in Python. We find that synthesis performance scales log-linearly with model size. We find that even our best models are generally unable to predict the output of a program given a specific input.
arXiv Detail & Related papers (2021-08-16T03:57:30Z)
A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution [3.3758186776249928]
We aim to address this problem in settings where the predictive model is a black box. We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output. We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions.
arXiv Detail & Related papers (2020-08-01T23:20:57Z)
Investigation of Sentiment Controllable Chatbot [50.34061353512263]
In this paper, we investigate four models to scale or adjust the sentiment of the response. The models are a persona-based model, reinforcement learning, a plug and play model, and CycleGAN. We develop machine-evaluated metrics to estimate whether the responses are reasonable given the input.
arXiv Detail & Related papers (2020-07-11T16:04:30Z)
Imputer: Sequence Modelling via Imputation and Dynamic Programming [101.5705527605346]
Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood.
arXiv Detail & Related papers (2020-02-20T18:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.