AceCoder: Utilizing Existing Code to Enhance Code Generation
- URL: http://arxiv.org/abs/2303.17780v3
- Date: Thu, 7 Sep 2023 11:29:44 GMT
- Title: AceCoder: Utilizing Existing Code to Enhance Code Generation
- Authors: Jia Li, Yunfei Zhao, Yongmin Li, Ge Li, Zhi Jin
- Abstract summary: Existing prompting techniques are designed for natural language generation and have low accuracy in code generation.
AceCoder contains two novel mechanisms (i.e., guided code generation and example retrieval) to solve these challenges.
Results show that AceCoder can significantly improve the performance of LLMs on code generation.
- Score: 45.034292331340524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown great success in code generation.
LLMs take as the input a prompt and output the code. A key question is how to
make prompts (i.e., Prompting Techniques). Existing prompting techniques are
designed for natural language generation and have low accuracy in code
generation.
In this paper, we propose a new prompting technique named AceCoder. Our
motivation is that code generation meets two unique challenges (i.e.,
requirement understanding and code implementation). AceCoder contains two novel
mechanisms (i.e., guided code generation and example retrieval) to solve these
challenges. (1) Guided code generation asks LLMs first to analyze requirements
and output an intermediate preliminary (e.g., test cases). The preliminary is
used to clarify requirements and tell LLMs "what to write". (2) Example
retrieval selects similar programs as examples in prompts, which provide lots
of relevant content (e.g., algorithms, APIs) and teach LLMs "how to write". We
apply AceCoder to three LLMs (e.g., Codex) and evaluate it on three public
benchmarks using the Pass@k. Results show that AceCoder can significantly
improve the performance of LLMs on code generation. (1) In terms of Pass@1,
AceCoder outperforms the state-of-the-art baseline by up to 56.4% in MBPP,
70.7% in MBJP, and 88.4% in MBJSP. (2) AceCoder is effective in LLMs with
different sizes (i.e., 6B to 13B) and different languages (i.e., Python, Java,
and JavaScript). (3) Human evaluation shows human developers prefer programs
from AceCoder.
Related papers
- Showing LLM-Generated Code Selectively Based on Confidence of LLMs [44.23673533981599]
Large Language Models (LLMs) have shown impressive abilities in code generation, but they may generate erroneous programs.
Showing these erroneous programs to developers will waste developers' energies and introduce security risks.
We propose HonestCoder, a novel LLM-based code generation approach.
arXiv Detail & Related papers (2024-10-04T08:51:31Z) - EPiC: Cost-effective Search-based Prompt Engineering of LLMs for Code Generation [8.009881267479189]
Large Language Models (LLMs) have seen increasing use in various software development tasks, especially in code generation.
We propose an alternative approach named Evolutionary Prompt Engineering for Code (EPiC) to evolve the original prompts toward better ones that produce high-quality code.
Our evaluation against state-of-the-art (SOTA) LLM-based code generation models shows that EPiC outperforms all the baselines in terms of cost-effectiveness.
arXiv Detail & Related papers (2024-08-20T21:15:36Z) - When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention [43.39584272739589]
We introduce CodeFast, an inference acceleration approach for Code LLMs on code generation.
Key idea of CodeFast is to terminate the inference process in time when unnecessary excess tokens are detected.
We conduct extensive experiments with CodeFast on five representative Code LLMs across four widely used code generation datasets.
arXiv Detail & Related papers (2024-07-29T14:27:08Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - Structured Chain-of-Thought Prompting for Code Generation [48.43888515848583]
Chain-of-Thought (CoT) prompting is the state-of-the-art prompting technique.
We propose Structured CoTs (SCoTs) and present a novel prompting technique for code generation, named SCoT prompting.
arXiv Detail & Related papers (2023-05-11T06:43:37Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.