Iterative Self-Training for Code Generation via Reinforced Re-Ranking
- URL: http://arxiv.org/abs/2504.09643v1
- Date: Sun, 13 Apr 2025 16:34:17 GMT
- Title: Iterative Self-Training for Code Generation via Reinforced Re-Ranking
- Authors: Nikita Sorokin, Ivan Sedykh, Valentin Malykh,
- Abstract summary: We propose a novel iterative self-training approach for self-training reranker models using Proximal Policy Optimization (PPO)<n>Unlike traditional PPO approaches, our approach emphasizes the development of a robust reward/reranking model.<n>Our method refines the training dataset by re-evaluating outputs, identifying high-scoring negative examples, and incorporating them into the training loop.
- Score: 5.77678027975395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating high-quality code that solves complex programming tasks is challenging, especially with current decoder-based models that produce highly stochastic outputs. In code generation, even minor errors can easily break the entire solution. Leveraging multiple sampled solutions can significantly improve the overall output quality. One effective way to enhance code generation is by pairing a code generation model with a reranker model, which selects the best solution from the generated samples. We propose a novel iterative self-training approach for self-training reranker models using Proximal Policy Optimization (PPO), aimed at improving both reranking accuracy and the overall code generation process. Unlike traditional PPO approaches, where the focus is on optimizing a generative model with a reward model, our approach emphasizes the development of a robust reward/reranking model. This model improves the quality of generated code through reranking and addresses problems and errors that the reward model might overlook during PPO alignment with the reranker. Our method iteratively refines the training dataset by re-evaluating outputs, identifying high-scoring negative examples, and incorporating them into the training loop, that boosting model performance. Our evaluation on the MultiPL-E dataset demonstrates that our 13.4B parameter model outperforms a 33B model in code generation quality while being three times faster. Moreover, it achieves performance comparable to GPT-4 and surpasses it in one programming language.
Related papers
- Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation [9.409062607311528]
Large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation.
Existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets.
We introduce an iterative training paradigm named ITERTL to mitigate these issues.
Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark.
arXiv Detail & Related papers (2024-06-28T01:44:57Z) - UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback [21.858896845159208]
Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs.
Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model.
Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset.
arXiv Detail & Related papers (2024-06-11T21:53:46Z) - Re-ReST: Reflection-Reinforced Self-Training for Language Agents [101.22559705696885]
Self-training in language agents can generate supervision from the agent itself.
We present Reflection-Reinforced Self-Training (Re-ReST), which uses a textitreflector to refine low-quality generated samples.
arXiv Detail & Related papers (2024-06-03T16:21:38Z) - Non-autoregressive Generative Models for Reranking Recommendation [9.854541524740549]
In a recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items.<n>We propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness.<n> NAR4Rec has been fully deployed in a popular video app Kuaishou with over 300 million daily active users.
arXiv Detail & Related papers (2024-02-10T03:21:13Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models.
We propose a simple and effective iterative training method called MIx Source and pseudo Target.
Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.