IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code
Completion
- URL: http://arxiv.org/abs/2401.16637v3
- Date: Thu, 22 Feb 2024 00:59:55 GMT
- Title: IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code
Completion
- Authors: Bolun Li, Zhihong Sun, Tao Huang, Hongyu Zhang, Yao Wan, Ge Li, Zhi
Jin, Chen Lyu
- Abstract summary: We propose IRCoCo, a code completion-specific DRL-based fine-tuning framework.
We show that fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the code completion task.
- Score: 38.863871578280936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code completion aims to enhance programming productivity by predicting
potential code based on the current programming context. Recently, pretrained
language models (LMs) have become prominent in this field. Various approaches
have been proposed to fine-tune LMs using supervised fine-tuning (SFT)
techniques for code completion. However, the inherent exposure bias of these
models can cause errors to accumulate early in the sequence completion, leading
to even more errors in subsequent completions. To address this problem, deep
reinforcement learning (DRL) is an alternative technique for fine-tuning LMs
for code completion, which can improve the generalization capabilities and
overall performance. Nevertheless, integrating DRL-based strategies into code
completion faces two major challenges: 1) The dynamic nature of the code
context requires the completion model to quickly adapt to changes, which poses
difficulties for conventional DRL strategies that focus on delayed rewarding of
the final code state. 2) It is difficult to evaluate the correctness of partial
code, thus the reward redistribution-based strategies cannot be adapted to code
completion. To tackle these challenges, we propose IRCoCo, a code
completion-specific DRL-based fine-tuning framework. This framework is designed
to provide immediate rewards as feedback for detecting dynamic context changes
arising from continuous edits during code completion. With the aid of immediate
feedback, the fine-tuned LM can gain a more precise understanding of the
current context, thereby enabling effective adjustment of the LM and optimizing
code completion in a more refined manner. Experimental results demonstrate that
fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the
code completion task, outperforming both SFT-based and other DRL-based
baselines.
Related papers
- CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - Adaptive Draft-Verification for Efficient Large Language Model Decoding [24.347886232342862]
Large language model (LLM) decoding involves generating a sequence of tokens based on a given context.
The typical autoregressive decoding method requires a separate forward pass through the model for each token generated.
We introduce ADED, which accelerates LLM decoding without requiring fine-tuning.
arXiv Detail & Related papers (2024-06-27T22:20:39Z) - Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes.
The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - REPOFUSE: Repository-Level Code Completion with Fused Dual Context [11.531678717514724]
This paper introduces REPOFUSE, a pioneering solution designed to enhance repository-level code completion without the latency trade-off.
We propose a novel rank truncated generation (RTG) technique that efficiently condenses two types of context into prompts with restricted size.
REPOFUSE has demonstrated a significant leap over existing models, achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code completions and a 26.8% enhancement in inference speed.
arXiv Detail & Related papers (2024-02-22T06:34:50Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Execution-based Code Generation using Deep Reinforcement Learning [8.085533911328577]
PPOCoder is a new framework for code generation that combines pre-trained PL models with Proximal Policy Optimization.
PPOCoder seamlessly integrates external code-specific knowledge into the model optimization process.
It's important to note that PPOCoder is a task-agnostic and model-agnostic framework that can be used across different code generation tasks and PLs.
arXiv Detail & Related papers (2023-01-31T18:02:26Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.