Related papers: Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions

Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions

URL: http://arxiv.org/abs/2203.03771v1
Date: Mon, 7 Mar 2022 23:17:17 GMT
Title: Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
Authors: David Bieber, Rishab Goel, Daniel Zheng, Hugo Larochelle, Daniel Tarlow
Abstract summary: We introduce a real-world dataset and task for predicting runtime errors. We develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions. We show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error.
Score: 31.46148643917194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a "static" setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and "learns to execute" descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.

Related papers

A Case Study on Model Checking and Runtime Verification for Awkernel [0.7199733380797578]
It is difficult for humans to manually review concurrent behaviors or to write test cases covering all possible executions. This paper proposes a development method for concurrent software, such as schedulers.
arXiv Detail & Related papers (2025-03-12T11:27:45Z)
Go-Oracle: Automated Test Oracle for Go Concurrency Bugs [6.773048267569272]
Bugs have become a prevalent issue within the Go programming language. Our work seeks to address the test oracle problem for Go programs, to automatically classify test executions as pass or fail. We capture a comprehensive array of execution events using the native Go execution tracer. We preprocess and encode these traces before training a transformer-based neural network to effectively classify the traces as either passing or failing.
arXiv Detail & Related papers (2024-12-11T03:07:56Z)
Multi-Task Program Error Repair and Explanatory Diagnosis [28.711745671275477]
We present a novel machine-learning approach for Multi-task Program Error Repair and Explanatory Diagnosis (mPRED) A pre-trained language model is used to encode the source code, and a downstream model is specifically designed to identify and repair errors. To aid in visualizing and analyzing the program structure, we use a graph neural network for program structure visualization.
arXiv Detail & Related papers (2024-10-09T05:09:24Z)
Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs [11.347234752942684]
This paper introduces a novel machine learning-based framework called CodeFlow to predict code coverage and detect runtime errors. CodeFlow represents all possible execution paths and the relationships between different statements. It learns dynamic dependencies through execution traces, which reflect the impacts among statements during execution.
arXiv Detail & Related papers (2024-08-05T20:32:00Z)
NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs. We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z)
Inferring Non-Failure Conditions for Declarative Programs [0.0]
Unintended failures during a computation are painful but frequent during software development. Programming failures, such as calling a partially defined operation with unintended arguments, are often not caught due to the assumption that the software is correct. This paper presents an approach to verify such assumptions.
arXiv Detail & Related papers (2024-02-20T12:25:36Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Using Large Language Models to Enhance Programming Error Messages [5.903720638984496]
Large language models can be used to create useful enhancements to programming error messages. We discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
arXiv Detail & Related papers (2022-10-20T23:17:26Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
Chain of Thought Imitation with Procedure Cloning [129.62135987416164]
We propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations. We show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations.
arXiv Detail & Related papers (2022-05-22T13:14:09Z)
Learning from Executions for Semantic Parsing [86.94309120789396]
We focus on the task of semi-supervised learning where a limited amount of annotated data is available. We propose to encourage executable programs for unlabeled utterances.
arXiv Detail & Related papers (2021-04-12T21:07:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.