Fault-Aware Neural Code Rankers
- URL: http://arxiv.org/abs/2206.03865v1
- Date: Sat, 4 Jun 2022 22:01:05 GMT
- Title: Fault-Aware Neural Code Rankers
- Authors: Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark
Encarnaci\'on, Shuvendu K Lahiri, Madanlal Musuvathi, Jianfeng Gao
- Abstract summary: We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
- Score: 64.41888054066861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have demonstrated an impressive ability to
generate code for various programming tasks. In many instances, LLMs can
generate a correct program for a task when given numerous trials. Consequently,
a recent trend is to do large scale sampling of programs using a model and then
filtering/ranking the programs based on the program execution on a small number
of known unit tests to select one candidate solution. However, these approaches
assume that the unit tests are given and assume the ability to safely execute
the generated programs (which can do arbitrary dangerous operations such as
file manipulations). Both of the above assumptions are impractical in
real-world software development. In this paper, we propose fault-aware neural
code rankers that can predict the correctness of a sampled program without
executing it. The fault-aware rankers are trained to predict different kinds of
execution information such as predicting the exact compile/runtime error type
(e.g., an IndexError or a TypeError). We show that our fault-aware rankers can
significantly increase the pass@1 accuracy of various code generation models
(including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.
Related papers
- Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation [0.24578723416255752]
We evaluate five different large language models (LLMs) concerning their capabilities for text-to-code generation.
ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama.
arXiv Detail & Related papers (2024-09-06T10:03:49Z) - Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs.
We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z) - SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly [6.080751346188323]
This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence transformer trained over real-world code.
We utilize type-inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches.
arXiv Detail & Related papers (2023-05-21T17:31:39Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.