Competition-Level Code Generation with AlphaCode
- URL: http://arxiv.org/abs/2203.07814v1
- Date: Tue, 8 Feb 2022 23:16:31 GMT
- Title: Competition-Level Code Generation with AlphaCode
- Authors: Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian
Schrittwieser, R\'emi Leblond, Tom Eccles, James Keeling, Felix Gimeno,
Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor
Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey
Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson,
Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, Oriol Vinyals
- Abstract summary: We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning.
In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
- Score: 74.87216298566942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Programming is a powerful and ubiquitous problem-solving tool. Developing
systems that can assist programmers or even generate programs independently
could make programming more productive and accessible, yet so far incorporating
innovations in AI has proven challenging. Recent large-scale language models
have demonstrated an impressive ability to generate code, and are now able to
complete simple programming tasks. However, these models still perform poorly
when evaluated on more complex, unseen problems that require problem-solving
skills beyond simply translating instructions into code. For example,
competitive programming problems which require an understanding of algorithms
and complex natural language remain extremely challenging. To address this gap,
we introduce AlphaCode, a system for code generation that can create novel
solutions to these problems that require deeper reasoning. In simulated
evaluations on recent programming competitions on the Codeforces platform,
AlphaCode achieved on average a ranking of top 54.3% in competitions with more
than 5,000 participants. We found that three key components were critical to
achieve good and reliable performance: (1) an extensive and clean competitive
programming dataset for training and evaluation, (2) large and
efficient-to-sample transformer-based architectures, and (3) large-scale model
sampling to explore the search space, followed by filtering based on program
behavior to a small set of submissions.
Related papers
- Code Generation and Algorithmic Problem Solving Using Llama 3.1 405B [0.0]
Llama-driven code generation can translate natural language prompts into executable code across multiple programming languages.
Llama can serve as a versatile tool for developers of all skill levels, improving productivity and efficiency in software development.
The potential implications for education, industry, and the future of coding practices are also discussed.
arXiv Detail & Related papers (2024-09-26T13:29:20Z) - No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair [9.562123938545522]
toolname can integrate various code search, generation, and repair tools, combining these three research areas together for the first time.
We conduct preliminary experiments to demonstrate the potential of our framework, eg helping CodeLlama solve 267 programming problems with an improvement of 62.53%.
arXiv Detail & Related papers (2024-09-05T06:24:29Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - A Comparative Study of Code Generation using ChatGPT 3.5 across 10
Programming Languages [0.0]
Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training.
This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022.
The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains.
arXiv Detail & Related papers (2023-08-08T15:02:32Z) - Evaluating GPT's Programming Capability through CodeWars' Katas [0.5512295869673147]
This paper presents a novel evaluation of the programming proficiency of Generative Pretrained Transformer (GPT) models.
The experiments reveal a distinct boundary at the 3kyu level, beyond which these GPT models struggle to provide solutions.
The research emphasizes the need for validation and creative thinking capabilities in AI models to better emulate human problem-solving techniques.
arXiv Detail & Related papers (2023-05-31T10:36:16Z) - Think Outside the Code: Brainstorming Boosts Large Language Models in
Code Generation [9.904734169174356]
In this paper, we introduce Brainstorm framework for code generation.
It leverages a brainstorming step that generates and selects diverse thoughts on the problem.
Brainstorm significantly enhances the ability of LLMs to solve competition-level programming problems.
arXiv Detail & Related papers (2023-05-18T03:32:54Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.