Multi-Turn Code Generation Through Single-Step Rewards
- URL: http://arxiv.org/abs/2502.20380v1
- Date: Thu, 27 Feb 2025 18:55:05 GMT
- Title: Multi-Turn Code Generation Through Single-Step Rewards
- Authors: Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush, Wenting Zhao, Sanjiban Choudhury,
- Abstract summary: Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards.<n>We propose a simple yet scalable approach, $mu$Code, that solves multi-turn code generation using only single-step rewards.
- Score: 68.05767417891057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, $\mu$Code, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recoverable MDP, where the correct code can be recovered from any intermediate code state in a single turn. $\mu$Code iteratively trains both a generator to provide code solutions conditioned on multi-turn execution feedback and a verifier to score the newly generated code. Experimental evaluations show that our approach achieves significant improvements over the state-of-the-art baselines. We provide analysis of the design choices of the reward models and policy, and show the efficacy of $\mu$Code at utilizing the execution feedback. Our code is available at https://github.com/portal-cornell/muCode.
Related papers
- GenX: Mastering Code and Test Generation with Execution Feedback [7.225594526057816]
We propose a novel approach that concurrently trains a code generation model and a test generation model.<n>We introduce two strategies for test and code data augmentation and a new scoring function for code and test ranking.<n>The results demonstrate that our models, when iteratively trained with an increasing number of test cases and code solutions, outperform those trained on the original dataset.
arXiv Detail & Related papers (2024-12-18T03:18:21Z) - ConAIR:Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation [17.68163468068264]
We propose a Consistency-Augmented Iterative Interaction Framework to enhance the reliability of Code Generation, ConAIR.
We show that with minimal human effort, performance can be significantly enhanced.
arXiv Detail & Related papers (2024-11-23T15:26:24Z) - Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates [46.74037090843497]
Large Language Models (LLMs) are transforming the way developers approach programming by automatically generating code based on natural language descriptions.
This paper puts forward RankEF, an innovative approach for code ranking that leverages execution feedback.
Experiments on three code generation benchmarks demonstrate that RankEF significantly outperforms the state-of-the-art CodeRanker.
arXiv Detail & Related papers (2024-08-26T01:48:57Z) - DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components.
Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z) - Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking [11.109866941442641]
Top Pass is a code ranking approach that identifies potential correct solutions from a large number of candidates.
This enables the user to find the correct solution within as few tries as possible.
arXiv Detail & Related papers (2024-08-11T07:53:51Z) - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass [72.07642648108849]
Superposed Decoding is a new decoding algorithm that generates $k$ drafts at the cost of one autoregressive inference pass.
Superposed Decoding can be combined with other decoding strategies, resulting in universal coverage gains when scaling inference time compute.
arXiv Detail & Related papers (2024-05-28T17:40:48Z) - Functional Overlap Reranking for Neural Code Generation [6.665515707408405]
We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation.
By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions.
Empirical results show that our method achieves remarkable results on the pass@1 score.
arXiv Detail & Related papers (2023-10-16T22:20:31Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - InCoder: A Generative Model for Code Infilling and Synthesis [88.46061996766348]
We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) and editing (via infilling)
InCoder is trained to generate code files from a large corpus of permissively licensed code.
Our model is the first generative model that is able to directly perform zero-shot code infilling.
arXiv Detail & Related papers (2022-04-12T16:25:26Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.