CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
- URL: http://arxiv.org/abs/2507.17548v1
- Date: Wed, 23 Jul 2025 14:26:58 GMT
- Title: CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
- Authors: Lingxiao Tang, He Ye, Zhongxin Liu, Xiaoxue Ren, Lingfeng Bao,
- Abstract summary: Code reasoning is a fundamental capability for large language models (LLMs) in the code domain.<n>Prior approaches mainly rely on supervised fine-tuning to improve performance in code reasoning tasks.<n>We argue this is due to two core issues: the low quality of training data and the limitations of supervised fine-tuning.<n>We propose CodeReasoner, a framework that spans both dataset construction and a two-stage training process.
- Score: 8.197518276987989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code reasoning is a fundamental capability for large language models (LLMs) in the code domain. It involves understanding and predicting a program's execution behavior, such as determining the output for a given input or whether a specific statement will be executed. This capability is essential for downstream tasks like debugging, code generation, and program repair. Prior approaches mainly rely on supervised fine-tuning to improve performance in code reasoning tasks. However, they often show limited gains and fail to generalize across diverse scenarios. We argue this is due to two core issues: the low quality of training data and the limitations of supervised fine-tuning, which struggles to teach general reasoning skills. To address these challenges, we propose CodeReasoner, a framework that spans both dataset construction and a two-stage training process. First, we introduce a method to construct datasets that focus on the core execution logic of Python programs. Next, we apply instruction tuning to inject execution-specific knowledge distilled from a powerful teacher model. We then enhance reasoning and generalization through GRPO reinforcement learning on top of the fine-tuned model. Experiments on three widely-used code reasoning benchmarks show that CodeReasoner improves performance by 27.1% to 40.2% over prior methods using a 7B model. Notably, the 7B model matches GPT-4o on key tasks like input/output and coverage prediction. When scaled to 14B, CodeReasoner outperforms GPT-4o across all benchmarks. Ablation studies confirm the effectiveness of each training stage and highlight the importance of reasoning chains.
Related papers
- OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique [59.18475981916166]
We introduce OpenCodeReasoning-II, a dataset consisting of 2.5M question-solution-critique triples (approx. 35K unique programming questions)<n>In this work, we employ a two-stage supervised fine-tuning strategy. The first stage focuses on fine-tuning for code generation, while the second stage involves the joint training of models for both code generation and critique. Notably, the integration of our code generation and critique models leads to significant improvements in competitive coding performance.
arXiv Detail & Related papers (2025-07-11T23:35:54Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning [14.208804782749793]
We present R1-Code-Interpreter, an extension of a text-only Large Language Models (LLMs) trained via multi-turn supervised fine-tuning (SFT) and reinforcement learning (RL)<n>R1-Code-Interpreter autonomously generates multiple code queries during step-by-step reasoning.<n>Unlike prior RL work on narrow domains, we find that Code Interpreter training is significantly harder due to high task diversity and expensive code execution.
arXiv Detail & Related papers (2025-05-27T18:47:33Z) - Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions [8.540135660509058]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding.<n>We leverage influence functions to attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens.<n>High-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning.
arXiv Detail & Related papers (2025-05-26T13:15:26Z) - AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z) - Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z) - To Code, or Not To Code? Exploring Impact of Code in Pre-training [13.336902036852115]
We systematically investigate the impact of code data on general performance.
We find that code is a critical building block for generalization far beyond coding tasks.
Our work suggests investments in code quality and preserving code during pre-training have positive impacts.
arXiv Detail & Related papers (2024-08-20T14:58:13Z) - Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension [18.919972400933393]
We propose an advanced pretraining task, "Next Token Prediction+"
Following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task.
arXiv Detail & Related papers (2024-04-13T03:11:07Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z) - Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost.
We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning.
We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.