CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2207.01780v1
- Date: Tue, 5 Jul 2022 02:42:15 GMT
- Title: CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning
- Authors: Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven
C.H. Hoi
- Abstract summary: "CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
- Score: 92.36705236706678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Program synthesis or code generation aims to generate a program that
satisfies a problem specification. Recent approaches using large-scale
pretrained language models (LMs) have shown promising results, yet they have
some critical limitations. In particular, they often follow a standard
supervised fine-tuning procedure to train a code generation model only from the
pairs of natural-language problem descriptions and ground-truth programs. Such
paradigm largely ignores some important but potentially useful signals in the
problem specification such as unit tests, which thus often results in poor
performance when solving complex unseen coding tasks. To address the
limitations, we propose "CodeRL", a new framework for program synthesis tasks
through pretrained LMs and deep reinforcement learning (RL). Specifically,
during training, we treat the code-generating LM as an actor network, and
introduce a critic network that is trained to predict the functional
correctness of generated programs and provide dense feedback signals to the
actor. During inference, we introduce a new generation procedure with a
critical sampling strategy that allows a model to automatically regenerate
programs based on feedback from example unit tests and critic scores. For the
model backbones, we extended the encoder-decoder architecture of CodeT5 with
enhanced learning objectives, larger model sizes, and better pretraining data.
Our method not only achieves new SOTA results on the challenging APPS
benchmark, but also shows strong zero-shot transfer capability with new SOTA
results on the simpler MBPP benchmark.
Related papers
- The Graph's Apprentice: Teaching an LLM Low Level Knowledge for Circuit Quality Estimation [34.37154877681809]
We introduce VeriDistill, the first end-to-end machine learning model that directly processes raw Verilog code to predict circuit quality-of-result metrics.
Our model employs a novel knowledge distillation method, transferring low-level circuit insights via graphs into the predictor based on LLM.
Experiments show VeriDistill outperforms state-of-the-art baselines on large-scale Verilog datasets.
arXiv Detail & Related papers (2024-10-30T04:20:10Z) - Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks.
CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Automatic Unit Test Data Generation and Actor-Critic Reinforcement
Learning for Code Synthesis [16.88062487980405]
We present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests.
We show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance.
arXiv Detail & Related papers (2023-10-20T17:13:16Z) - Enhancing Automated Program Repair through Fine-tuning and Prompt
Engineering [2.3826139428423576]
Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset.
Some recent studies demonstrated strong empirical evidence that code review could improve the program repair further.
We investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair.
arXiv Detail & Related papers (2023-04-16T17:29:51Z) - The Wisdom of Hindsight Makes Language Models Better Instruction
Followers [84.9120606803906]
Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback.
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions.
arXiv Detail & Related papers (2023-02-10T12:16:38Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.