Code Generation with AlphaCodium: From Prompt Engineering to Flow
Engineering
- URL: http://arxiv.org/abs/2401.08500v1
- Date: Tue, 16 Jan 2024 17:00:36 GMT
- Title: Code Generation with AlphaCodium: From Prompt Engineering to Flow
Engineering
- Authors: Tal Ridnik, Dedy Kredo, Itamar Friedman
- Abstract summary: We propose a new approach to code generation by LLMs - a test-based, multi-stage, code-oriented iterative flow.
We tested AlphaCodium on a challenging code generation dataset called CodeContests.
For example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow.
- Score: 6.779943486567506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation problems differ from common natural language problems - they
require matching the exact syntax of the target language, identifying happy
paths and edge cases, paying attention to numerous small details in the problem
spec, and addressing other code-specific issues and requirements. Hence, many
of the optimizations and tricks that have been successful in natural language
generation may not be effective for code tasks. In this work, we propose a new
approach to code generation by LLMs, which we call AlphaCodium - a test-based,
multi-stage, code-oriented iterative flow, that improves the performances of
LLMs on code problems. We tested AlphaCodium on a challenging code generation
dataset called CodeContests, which includes competitive programming problems
from platforms such as Codeforces. The proposed flow consistently and
significantly improves results. On the validation set, for example, GPT-4
accuracy (pass@5) increased from 19% with a single well-designed direct prompt
to 44% with the AlphaCodium flow. Many of the principles and best practices
acquired in this work, we believe, are broadly applicable to general code
generation tasks. Full implementation is available at:
https://github.com/Codium-ai/AlphaCodium
Related papers
- Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities.
The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z) - Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking [11.109866941442641]
Top Pass is a code ranking approach that identifies potential correct solutions from a large number of candidates.
This enables the user to find the correct solution within as few tries as possible.
arXiv Detail & Related papers (2024-08-11T07:53:51Z) - When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention [43.39584272739589]
We introduce CodeFast, an inference acceleration approach for Code LLMs on code generation.
Key idea of CodeFast is to terminate the inference process in time when unnecessary excess tokens are detected.
We conduct extensive experiments with CodeFast on five representative Code LLMs across four widely used code generation datasets.
arXiv Detail & Related papers (2024-07-29T14:27:08Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - MapCoder: Multi-Agent Code Generation for Competitive Problem Solving [3.3856216159724983]
We introduce a new approach to code generation tasks leveraging multi-agent prompting.
Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of program synthesis.
Our method consistently delivers superior performance across various programming languages.
arXiv Detail & Related papers (2024-05-18T22:10:15Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Test-Case-Driven Programming Understanding in Large Language Models for
Better Code Generation [15.166827643436346]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)
It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.
muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain.
We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement.
We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.