Automatic Unit Test Data Generation and Actor-Critic Reinforcement
Learning for Code Synthesis
- URL: http://arxiv.org/abs/2310.13669v1
- Date: Fri, 20 Oct 2023 17:13:16 GMT
- Title: Automatic Unit Test Data Generation and Actor-Critic Reinforcement
Learning for Code Synthesis
- Authors: Philip John Gorinski, Matthieu Zimmer, Gerasimos Lampouras, Derrick
Goh Xin Deik, Ignacio Iacobacci
- Abstract summary: We present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests.
We show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance.
- Score: 16.88062487980405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of large pre-trained language models in the domain of Code
Synthesis has shown remarkable performance on various benchmarks, treating the
problem of Code Generation in a fashion similar to Natural Language Generation,
trained with a Language Modelling (LM) objective. In addition, the property of
programming language code being precisely evaluable with respect to its
semantics -- through the use of Unit Tests to check its functional correctness
-- lends itself to using Reinforcement Learning (RL) as a further training
paradigm. Previous work has shown that RL can be applied as such to improve
models' coding capabilities; however, such RL-based methods rely on a reward
signal based on defined Unit Tests, which are much harder to obtain compared to
the huge crawled code datasets used in LM objectives. In this work, we present
a novel approach to automatically obtain data consisting of function signatures
and associated Unit Tests, suitable for RL training of Code Synthesis models.
We also introduce a straightforward, simple yet effective Actor-Critic RL
training scheme and show that it, in conjunction with automatically generated
training data, leads to improvement of a pre-trained code language model's
performance by up to 9.9% improvement over the original underlying code
synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or
CodeRL.
Related papers
- Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning [4.975728472540823]
We present techniques that integrate various clustering and pruning metrics to selectively reduce training data without compromising the accuracy and functionality of the generated code.
Our experiments show that these pruning strategies not only reduce the computational resources needed but also enhance the overall quality code generation.
arXiv Detail & Related papers (2024-07-06T10:30:43Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Reinforcement Learning from Automatic Feedback for High-Quality Unit
Test Generation [13.658632458850144]
Large Language Models (LLMs) have gained popularity for code generation, including the automated creation of test cases.
LLMs are often trained on vast amounts of publicly available code, which may include test cases that do not adhere to best practices.
We propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM)
arXiv Detail & Related papers (2023-10-03T18:48:31Z) - Reinforced Self-Training (ReST) for Language Modeling [56.75447441157628]
Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences.
We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST)
Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.
arXiv Detail & Related papers (2023-08-17T14:12:48Z) - RLTF: Reinforcement Learning from Unit Test Feedback [17.35361167578498]
Reinforcement Learning from Unit Test Feedback is a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs.
Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code.
arXiv Detail & Related papers (2023-07-10T05:18:18Z) - Is Reinforcement Learning (Not) for Natural Language Processing?:
Benchmarks, Baselines, and Building Blocks for Natural Language Policy
Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning.
Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions.
Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.