Reinforcement Learning from Automatic Feedback for High-Quality Unit
Test Generation
- URL: http://arxiv.org/abs/2310.02368v1
- Date: Tue, 3 Oct 2023 18:48:31 GMT
- Title: Reinforcement Learning from Automatic Feedback for High-Quality Unit
Test Generation
- Authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey
Svyatkovskiy
- Abstract summary: Large Language Models (LLMs) have gained popularity for code generation, including the automated creation of test cases.
LLMs are often trained on vast amounts of publicly available code, which may include test cases that do not adhere to best practices.
We propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM)
- Score: 13.658632458850144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software testing is a crucial aspect of software development, and the
creation of high-quality tests that adhere to best practices is essential for
effective maintenance. Recently, Large Language Models (LLMs) have gained
popularity for code generation, including the automated creation of test cases.
However, these LLMs are often trained on vast amounts of publicly available
code, which may include test cases that do not adhere to best practices and may
even contain test smells (anti-patterns). To address this issue, we propose a
novel technique called Reinforcement Learning from Static Quality Metrics
(RLSQM). To begin, we analyze the anti-patterns generated by the LLM and show
that LLMs can generate undesirable test smells. Thus, we train specific reward
models for each static quality metric, then utilize Proximal Policy
Optimization (PPO) to train models for optimizing a single quality metric at a
time. Furthermore, we amalgamate these rewards into a unified reward model
aimed at capturing different best practices and quality aspects of tests. By
comparing RL-trained models with those trained using supervised learning, we
provide insights into how reliably utilize RL to improve test generation
quality and into the effects of various training strategies. Our experimental
results demonstrate that the RL-optimized model consistently generated
high-quality test cases compared to the base LLM, improving the model by up to
21%, and successfully generates nearly 100% syntactically correct code. RLSQM
also outperformed GPT-4 on four out of seven metrics. This represents a
significant step towards enhancing the overall efficiency and reliability of
software testing through Reinforcement Learning and static quality metrics. Our
data are available at this link: https://figshare.com/s/ded476c8d4c221222849.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation [11.056044348209483]
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints.
Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation.
arXiv Detail & Related papers (2024-06-28T20:38:41Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation.
SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2.
Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z) - Automatic Unit Test Data Generation and Actor-Critic Reinforcement
Learning for Code Synthesis [16.88062487980405]
We present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests.
We show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance.
arXiv Detail & Related papers (2023-10-20T17:13:16Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.