Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates
of Programs
- URL: http://arxiv.org/abs/2309.11726v1
- Date: Thu, 21 Sep 2023 01:59:20 GMT
- Title: Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates
of Programs
- Authors: Alex Renda, Yi Ding, Michael Carbin
- Abstract summary: We present a methodology for sampling datasets to train neural-network-based surrogates of programs.
We first characterize the proportion of data to sample from each region of a program's input space based on the complexity of learning a surrogate of the corresponding execution path.
We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
- Score: 14.940174578659603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Programmers and researchers are increasingly developing surrogates of
programs, models of a subset of the observable behavior of a given program, to
solve a variety of software development challenges. Programmers train
surrogates from measurements of the behavior of a program on a dataset of input
examples. A key challenge of surrogate construction is determining what
training data to use to train a surrogate of a given program.
We present a methodology for sampling datasets to train neural-network-based
surrogates of programs. We first characterize the proportion of data to sample
from each region of a program's input space (corresponding to different
execution paths of the program) based on the complexity of learning a surrogate
of the corresponding execution path. We next provide a program analysis to
determine the complexity of different paths in a program. We evaluate these
results on a range of real-world programs, demonstrating that complexity-guided
sampling results in empirical improvements in accuracy.
Related papers
- Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Latent Execution for Neural Program Synthesis Beyond Domain-Specific
Languages [97.58968222942173]
We take the first step to synthesize C programs from input-output examples.
In particular, we propose La Synth, which learns the latent representation to approximate the execution of partially generated programs.
We show that training on these synthesized programs further improves the prediction performance for both Karel and C program synthesis.
arXiv Detail & Related papers (2021-06-29T02:21:32Z) - Learning to Combine Per-Example Solutions for Neural Program Synthesis [35.0204840620086]
Most learning-based approaches try to find a program that satisfies all examples at once.
Our work considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples.
arXiv Detail & Related papers (2021-06-14T05:48:12Z) - Process Discovery for Structured Program Synthesis [70.29027202357385]
A core task in process mining is process discovery which aims to learn an accurate process model from event log data.
In this paper, we propose to use (block-) structured programs directly as target process models.
We develop a novel bottom-up agglomerative approach to the discovery of such structured program process models.
arXiv Detail & Related papers (2020-08-13T10:33:10Z) - Neural Program Synthesis with a Differentiable Fixer [44.48509453344902]
We present a new program synthesis approach that combines an encoder-decoder based synthesis architecture with a differentiable program fixer.
We train our architecture end-to-end on the RobustFill domain, and show that the addition of the fixer module leads to a significant improvement on synthesis accuracy.
arXiv Detail & Related papers (2020-06-19T01:49:46Z) - Creating Synthetic Datasets via Evolution for Neural Program Synthesis [77.34726150561087]
We show that some program synthesis approaches generalize poorly to data distributions different from that of the randomly generated examples.
We propose a new, adversarial approach to control the bias of synthetic data distributions and show that it outperforms current approaches.
arXiv Detail & Related papers (2020-03-23T18:34:15Z) - Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.
We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.