Self-Edit: Fault-Aware Code Editor for Code Generation
- URL: http://arxiv.org/abs/2305.04087v5
- Date: Mon, 11 Sep 2023 06:27:53 GMT
- Title: Self-Edit: Fault-Aware Code Editor for Code Generation
- Authors: Kechi Zhang, Zhuo Li, Jia Li, Ge Li, Zhi Jin
- Abstract summary: Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks.
We propose a generate-and-edit approach named Self-Edit to improve the code quality on the competitive programming task.
- Score: 46.890689359396724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated an impressive ability to
generate codes on competitive programming tasks. However, with limited sample
numbers, LLMs still suffer from poor accuracy. Inspired by the process of human
programming, we propose a generate-and-edit approach named Self-Edit that
utilizes execution results of the generated code from LLMs to improve the code
quality on the competitive programming task. We execute the generated code on
the example test case provided in the question and wrap execution results into
a supplementary comment. Utilizing this comment as guidance, our fault-aware
code editor is employed to correct errors in the generated code. We perform
extensive evaluations across two competitive programming datasets with nine
different LLMs. Compared to directly generating from LLMs, our approach can
improve the average of pass@1 by 89\% on APPS-dev, 31\% on APPS-test, and 48\%
on HumanEval over nine popular code generation LLMs with parameter sizes
ranging from 110M to 175B. Compared to other post-processing methods, our
method demonstrates superior accuracy and efficiency.
Related papers
- AIME: AI System Optimization via Multiple LLM Evaluators [79.03422337674664]
AIME is an evaluation protocol that utilizes multiple LLMs that each independently generate an evaluation on separate criteria and then combine them via concatenation.
We show AIME outperforming baseline methods in code generation tasks, with up to $62%$ higher error detection rate and up to $16%$ higher success rate than a single LLM evaluation protocol on LeetCodeHard and HumanEval datasets.
arXiv Detail & Related papers (2024-10-04T04:03:24Z) - A Performance Study of LLM-Generated Code on Leetcode [1.747820331822631]
This study evaluates the efficiency of code generation by Large Language Models (LLMs)
We compare 18 LLMs, considering factors such as model temperature and success rate, and their impact on code performance.
We find that LLMs are capable of generating code that is, on average, more efficient than the code written by humans.
arXiv Detail & Related papers (2024-07-31T13:10:03Z) - Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs [10.510325069289324]
We propose a self-refinement method aimed at improving the reliability of code generated by LLMs.
Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code.
Our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code.
arXiv Detail & Related papers (2024-05-22T19:02:50Z) - On Evaluating the Efficiency of Source Code Generated by LLMs [31.8121544062256]
More efficient code can lead to higher performance and execution efficiency of programs and software completed by LLM-assisted programming.
First, we evaluate the efficiency of the code generated by LLMs on two benchmarks, HumanEval and MBPP.
Then, we choose a set of programming problems from the online judge platform LeetCode to conduct a more difficult evaluation.
arXiv Detail & Related papers (2024-04-09T05:59:39Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven
AI Chaining [16.749379740049925]
Large Language Models (LLMs) have shown promising results in automatic code generation by improving coding efficiency to a certain extent.
However, generating high-quality and reliable code remains a formidable task because of LLMs' lack of good programming practice.
We propose a novel Knowledge-driven Prompt Chaining-based code generation approach, which decomposes code generation into an AI chain with iterative check-rewrite steps.
arXiv Detail & Related papers (2023-09-27T12:09:07Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.