Related papers: Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

URL: http://arxiv.org/abs/2506.03136v1
Date: Tue, 03 Jun 2025 17:58:42 GMT
Title: Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Authors: Yinjie Wang, Ling Yang, Ye Tian, Ke Shen, Mengdi Wang,
Abstract summary: We propose CURE, a novel reinforcement learning framework with a dedicated reward design.<n>CURE co-evolves coding and unit test generation capabilities based on their interaction outcomes.<n>We find that our model can serve as an effective reward model for reinforcement learning on base models.
Score: 33.676158965697184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose CURE, a novel reinforcement learning framework with a dedicated reward design that co-evolves coding and unit test generation capabilities based on their interaction outcomes, without any ground-truth code as supervision. This approach enables flexible and scalable training and allows the unit tester to learn directly from the coder's mistakes. Our derived ReasonFlux-Coder-7B and 14B models improve code generation accuracy by 5.3% and Best-of-N accuracy by 9.0% after optimization on Qwen2.5-Instruct models, outperforming similarly sized Qwen-Coder, DeepSeek-Coder, and Seed-Coder. They naturally extend to downstream tasks such as test-time scaling and agentic coding-achieving a 8.1% improvement over the base model. For the long-CoT model, our ReasonFlux-Coder-4B consistently outperforms Qwen3-4B while achieving 64.8% inference efficiency in unit test generation. Notably, we also find that our model can serve as an effective reward model for reinforcement learning on base models. Project: https://github.com/Gen-Verse/CURE

Related papers

Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
Rethinking Verification for LLM Code Generation: From Generation to Testing [44.46778801679273]
Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench.<n>We propose a new multi-dimensional metrics designed to rigorously quantify test-suite.<n> Experiments show that SAGA achieves a detection rate of 90.62% and a verifier accuracy of 32.58% on TCGBench.
arXiv Detail & Related papers (2025-07-09T14:58:47Z)
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z)
CRPE: Expanding The Reasoning Capability of Large Language Model for Code Generation [5.63821063617385]
CRPE (Code Reasoning Process Enhancer) is a framework for data synthesis and model training.<n>We develop an enhanced COT-Coder that demonstrates marked improvements in code generation tasks.<n>Our COT-Coder-32B-StepDPO, based on Qwen2.5-Coder-32B-Base, exhibits superior performance with a pass@1 accuracy of 35.08, outperforming GPT4O on the benchmark.
arXiv Detail & Related papers (2025-05-15T08:13:45Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
S*: Test Time Scaling for Code Generation [55.11863577956177]
We propose S*, the first hybrid test-time scaling framework for code generation.<n>S* substantially improves the coverage and selection accuracy of generated code.
arXiv Detail & Related papers (2025-02-20T09:18:53Z)
ACECODER: Acing Coder RL via Automated Test-Case Synthesis [36.740393665032954]
We design a pipeline that generates extensive (question, test-cases) pairs from existing code data.<n>We construct preference pairs based on pass rates over sampled programs to train reward models with Bradley-Terry loss.<n>We show that our RL training can improve model on HumanEval-plus by over 25% and MBPP-plus by 6% for merely 80 optimization steps.
arXiv Detail & Related papers (2025-02-03T18:46:04Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis [16.88062487980405]
We present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests. We show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance.
arXiv Detail & Related papers (2023-10-20T17:13:16Z)
Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.