Related papers: Coder Reviewer Reranking for Code Generation

Coder Reviewer Reranking for Code Generation

URL: http://arxiv.org/abs/2211.16490v1
Date: Tue, 29 Nov 2022 18:56:33 GMT
Title: Coder Reviewer Reranking for Code Generation
Authors: Tianyi Zhang, Tao Yu, Tatsunori B. Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I. Wang
Abstract summary: We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
Score: 56.80381384717
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the likelihood of the instruction given the generated programs. We perform an extensive study across six datasets with eight models from three model families. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement (up to 17% absolute accuracy gain) over reranking with the Coder model only. When combined with executability filtering, Coder-Reviewer reranking can often outperform the minimum Bayes risk method. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyperparameters.

Related papers

Iterative Self-Training for Code Generation via Reinforced Re-Ranking [5.77678027975395]
We propose a novel iterative self-training approach for self-training reranker models using Proximal Policy Optimization (PPO) Unlike traditional PPO approaches, our approach emphasizes the development of a robust reward/reranking model. Our method refines the training dataset by re-evaluating outputs, identifying high-scoring negative examples, and incorporating them into the training loop.
arXiv Detail & Related papers (2025-04-13T16:34:17Z)
Robust Learning of Diverse Code Edits [10.565439872488328]
Software engineering activities frequently involve edits to existing code. Code language models (LMs) lack the ability to handle diverse types of code-edit requirements.
arXiv Detail & Related papers (2025-03-05T16:39:04Z)
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation [13.75248879205993]
We propose Adaptive Critique Refinement (ACR), which enables the model to refine itself by self-generated code and external critique. ACR includes a composite scoring system with LLM-as-a-Judge to evaluate the quality of code responses. We develop the RefineCoder series by iteratively applying ACR, achieving continuous performance improvement on multiple code generation benchmarks.
arXiv Detail & Related papers (2025-02-13T11:17:53Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search [7.822427053078387]
Generation-Augmented Retrieval (GAR) framework generates exemplar code snippets to augment queries. We propose a simple yet effective method that additionally Rewrites the Code (ReCo) within the for style normalization. Code Style Similarity is the first metric tailored to quantify stylistic similarities in code.
arXiv Detail & Related papers (2024-01-09T12:12:50Z)
Refactoring Programs Using Large Language Models with Few-Shot Examples [20.48175387745551]
We demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program. We show that 95.68% of programs can beed by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity.
arXiv Detail & Related papers (2023-11-20T11:43:45Z)
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback [5.459517921633247]
We propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark.
arXiv Detail & Related papers (2023-07-27T15:28:29Z)
Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion. For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z)
Stochastic Code Generation [1.7205106391379026]
Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code. This issue is also observed in language modeling for long text generation. In this study, we investigate whether this technique can be applied to code generation to improve coherence.
arXiv Detail & Related papers (2023-04-14T00:01:05Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z)
InCoder: A Generative Model for Code Infilling and Synthesis [88.46061996766348]
We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) and editing (via infilling) InCoder is trained to generate code files from a large corpus of permissively licensed code. Our model is the first generative model that is able to directly perform zero-shot code infilling.
arXiv Detail & Related papers (2022-04-12T16:25:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.