Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking
- URL: http://arxiv.org/abs/2408.05715v1
- Date: Sun, 11 Aug 2024 07:53:51 GMT
- Title: Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking
- Authors: Zhi-Cun Lyu, Xin-Ye Li, Zheng Xie, Ming Li,
- Abstract summary: Top Pass is a code ranking approach that identifies potential correct solutions from a large number of candidates.
This enables the user to find the correct solution within as few tries as possible.
- Score: 11.109866941442641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.
Related papers
- ConAIR:Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation [17.68163468068264]
We propose a Consistency-Augmented Iterative Interaction Framework to enhance the reliability of Code Generation, ConAIR.
We show that with minimal human effort, performance can be significantly enhanced.
arXiv Detail & Related papers (2024-11-23T15:26:24Z) - Rethinking Code Refinement: Learning to Judge Code Efficiency [60.04718679054704]
Large Language Models (LLMs) have demonstrated impressive capabilities in understanding and generating codes.
We propose a novel method based on the code language model that is trained to judge the efficiency between two different codes.
We validate our method on multiple programming languages with multiple refinement steps, demonstrating that the proposed method can effectively distinguish between more and less efficient versions of code.
arXiv Detail & Related papers (2024-10-29T06:17:37Z) - CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests [16.19318541132026]
We show that within a Bayesian framework, the optimal selection strategy can be defined based on the posterior probability of the observed passing states between solutions and tests.
We propose an efficient approach for approximating this optimal (yet uncomputable) strategy, where the approximation error is bounded by the correctness of prior knowledge.
arXiv Detail & Related papers (2024-09-13T10:22:08Z) - Sifting through the Chaff: On Utilizing Execution Feedback for Ranking the Generated Code Candidates [46.74037090843497]
Large Language Models (LLMs) are transforming the way developers approach programming by automatically generating code based on natural language descriptions.
This paper puts forward RankEF, an innovative approach for code ranking that leverages execution feedback.
Experiments on three code generation benchmarks demonstrate that RankEF significantly outperforms the state-of-the-art CodeRanker.
arXiv Detail & Related papers (2024-08-26T01:48:57Z) - DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components.
Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z) - Code Generation with AlphaCodium: From Prompt Engineering to Flow
Engineering [6.779943486567506]
We propose a new approach to code generation by LLMs - a test-based, multi-stage, code-oriented iterative flow.
We tested AlphaCodium on a challenging code generation dataset called CodeContests.
For example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow.
arXiv Detail & Related papers (2024-01-16T17:00:36Z) - Functional Overlap Reranking for Neural Code Generation [6.665515707408405]
We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation.
By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions.
Empirical results show that our method achieves remarkable results on the pass@1 score.
arXiv Detail & Related papers (2023-10-16T22:20:31Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.