Related papers: ACECODER: Acing Coder RL via Automated Test-Case Synthesis

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

URL: http://arxiv.org/abs/2502.01718v3
Date: Mon, 10 Feb 2025 18:40:00 GMT
Title: ACECODER: Acing Coder RL via Automated Test-Case Synthesis
Authors: Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen,
Abstract summary: We design a pipeline that generates extensive (question, test-cases) pairs from existing code data.<n>We construct preference pairs based on pass rates over sampled programs to train reward models with Bradley-Terry loss.<n>We show that our RL training can improve model on HumanEval-plus by over 25% and MBPP-plus by 6% for merely 80 optimization steps.
Score: 36.740393665032954
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of reinforcement learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain. In this paper, we address this challenge by leveraging automated large-scale test-case synthesis to enhance code model training. Specifically, we design a pipeline that generates extensive (question, test-cases) pairs from existing code data. Using these test cases, we construct preference pairs based on pass rates over sampled programs to train reward models with Bradley-Terry loss. It shows an average of 10-point improvement for Llama-3.1-8B-Ins and 5-point improvement for Qwen2.5-Coder-7B-Ins through best-of-32 sampling, making the 7B model on par with 236B DeepSeek-V2.5. Furthermore, we conduct reinforcement learning with both reward models and test-case pass rewards, leading to consistent improvements across HumanEval, MBPP, BigCodeBench, and LiveCodeBench (V4). Notably, we follow the R1-style training to start from Qwen2.5-Coder-base directly and show that our RL training can improve model on HumanEval-plus by over 25\% and MBPP-plus by 6\% for merely 80 optimization steps. We believe our results highlight the huge potential of reinforcement learning in coder models.

Related papers

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning [33.676158965697184]
We propose CURE, a novel reinforcement learning framework with a dedicated reward design.<n>CURE co-evolves coding and unit test generation capabilities based on their interaction outcomes.<n>We find that our model can serve as an effective reward model for reinforcement learning on base models.
arXiv Detail & Related papers (2025-06-03T17:58:42Z)
Skywork Open Reasoner 1 Technical Report [51.403686909760914]
We present Skywork-OR1, an effective and scalable reinforcement learning (RL) implementation for long Chain-of-Thought (CoT) models.<n>Building on the DeepSeek-R1-Distill model series, our RL approach achieves notable performance gains.<n>Our Skywork-OR1-32B model surpasses both DeepSeek-R1 and Qwen3-32B on the AIME24 and AIME25 benchmarks.
arXiv Detail & Related papers (2025-05-28T12:56:04Z)
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z)
Reinforcement Learning for Reasoning in Large Language Models with One Training Example [129.11571295790807]
We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the math reasoning capabilities of large language models (LLMs) We identify some interesting phenomena during 1-shot RLVR, including cross-domain generalization, increased frequency of self-reflection, and sustained test performance improvement even after the training accuracy has saturated.
arXiv Detail & Related papers (2025-04-29T09:24:30Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning [65.2421542320293]
Reasoning abilities are crucial components of general intelligence. Recent advances by proprietary companies, such as o-series models of OpenAI, have made remarkable progress on reasoning tasks. This paper proposes a new RL framework, termed OREAL, to pursue the performance limit that can be achieved through textbfOutcome textbfREwtextbfArd-based reinforcement textbfLearning for mathematical reasoning tasks.
arXiv Detail & Related papers (2025-02-10T18:57:29Z)
Kimi k1.5: Scaling Reinforcement Learning with LLMs [84.2229964736678]
We report on the training practice of Kimi k1.5, our latest multi-modal language model trained with reinforcement learning.<n>Long context scaling and improved policy optimization methods are key ingredients of our approach.<n>Our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities.
arXiv Detail & Related papers (2025-01-22T02:48:14Z)
How to Select Pre-Trained Code Models for Reuse? A Learning Perspective [29.83900262273832]
Off-the-shelf Pre-trained Code Models (PCMs) have been released publicly.<n>We conduct experiments on 100 widely-used opensource PCMs for code intelligence tasks, with sizes ranging from 42.5 million to 3 billion parameters.<n>The results demonstrate that learning-based selection methods reduce selection time to 100 seconds, compared to 2,700 hours with brute-force fine-tuning.
arXiv Detail & Related papers (2025-01-07T13:45:24Z)
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining [3.8608102686867762]
Arctic-SnowCoder-1.3B is a data-efficient base code model pretrained on 555B tokens. Despite being trained on a limited dataset, Arctic-SnowCoder achieves state-of-the-art performance on BigCodeBench. Across all evaluated benchmarks, Arctic-SnowCoder-1.3B beats StarCoderBase-3B pretrained on 1T tokens.
arXiv Detail & Related papers (2024-09-03T22:36:42Z)
Self-Taught Evaluators [77.92610887220594]
We present an approach that aims to im-proves without human annotations, using synthetic training data only. Our Self-Taught Evaluator can improve a strong LLM from 75.4 to 88.3 on RewardBench.
arXiv Detail & Related papers (2024-08-05T17:57:02Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis [16.88062487980405]
We present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests. We show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance.
arXiv Detail & Related papers (2023-10-20T17:13:16Z)
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost. We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z)
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling [37.01593605084575]
TAS-Balanced is an efficient topic-aware query and balanced margin sampling technique. We show that our TAS-Balanced training method achieves state-of-the-art low-latency (64ms per query) results on two TREC Deep Learning Track query sets.
arXiv Detail & Related papers (2021-04-14T16:49:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.