StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
- URL: http://arxiv.org/abs/2402.01391v2
- Date: Mon, 5 Feb 2024 13:28:23 GMT
- Title: StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
- Authors: Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen,
Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou,
Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, Tao Gui
- Abstract summary: We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
- Score: 58.20547418182074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of large language models (LLMs) has significantly propelled
the field of code generation. Previous work integrated reinforcement learning
(RL) with compiler feedback for exploring the output space of LLMs to enhance
code generation quality. However, the lengthy code generated by LLMs in
response to complex human requirements makes RL exploration a challenge. Also,
since the unit tests may not cover the complicated code, optimizing LLMs by
using these unexecuted code snippets is ineffective. To tackle these
challenges, we introduce StepCoder, a novel RL framework for code generation,
consisting of two main components: CCCS addresses the exploration challenge by
breaking the long sequences code generation task into a Curriculum of Code
Completion Subtasks, while FGO only optimizes the model by masking the
unexecuted code segments to provide Fine-Grained Optimization. In addition, we
furthermore construct the APPS+ dataset for RL training, which is manually
verified to ensure the correctness of unit tests. Experimental results show
that our method improves the ability to explore the output space and
outperforms state-of-the-art approaches in corresponding benchmarks. Our
dataset APPS+ and StepCoder are available online.
Related papers
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
We introduce VersiCode, the first comprehensive dataset designed to assess the ability of large language models to generate verifiable code for specific library versions.
We design two dedicated evaluation tasks: version-specific code completion (VSCC) and version-aware code editing (VACE)
Comprehensive experiments are conducted to benchmark the performance of LLMs, revealing the challenging nature of these tasks and VersiCode.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - DolphCoder: Echo-Locating Code Large Language Models with Diverse and
Multi-Objective Instruction Tuning [36.78560777629329]
We introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation.
It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability.
Our model achieves superior performance on the HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2024-02-14T12:34:58Z) - A Review of Repository Level Prompting for LLMs [0.0]
Large Language Models (LLMs) have led to notable successes, such as achieving a 94.6% solve rate on the HumanEval benchmark.
There is an increasing commercial push for repository-level inline code completion tools, such as GitHub Copilot and Tab Nine.
This paper delves into the transition from individual coding problems to repository-scale solutions.
arXiv Detail & Related papers (2023-12-15T00:34:52Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Test-Case-Driven Programming Understanding in Large Language Models for
Better Code Generation [15.166827643436346]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)
It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.
muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - RLTF: Reinforcement Learning from Unit Test Feedback [17.35361167578498]
Reinforcement Learning from Unit Test Feedback is a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs.
Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code.
arXiv Detail & Related papers (2023-07-10T05:18:18Z) - Coarse-Tuning Models of Code with Reinforcement Learning Feedback [0.0]
Large Language Models (LLMs) pre-trained on code have emerged as the dominant approach to program synthesis.
We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code.
arXiv Detail & Related papers (2023-05-25T22:09:08Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.