Think Outside the Code: Brainstorming Boosts Large Language Models in
Code Generation
- URL: http://arxiv.org/abs/2305.10679v1
- Date: Thu, 18 May 2023 03:32:54 GMT
- Title: Think Outside the Code: Brainstorming Boosts Large Language Models in
Code Generation
- Authors: Xin-Ye Li, Jiang-Tian Xue, Zheng Xie and Ming Li
- Abstract summary: In this paper, we introduce Brainstorm framework for code generation.
It leverages a brainstorming step that generates and selects diverse thoughts on the problem.
Brainstorm significantly enhances the ability of LLMs to solve competition-level programming problems.
- Score: 9.904734169174356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation aims to automatically generate source code from high-level
task specifications, which can significantly increase productivity of software
engineering. Recently, approaches based on large language models (LLMs) have
shown remarkable code generation abilities on simple tasks. However, generate
code for more complex tasks, such as competition-level problems, remains
challenging. In this paper, we introduce Brainstorm framework for code
generation. It leverages a brainstorming step that generates and selects
diverse thoughts on the problem to facilitate algorithmic reasoning, where the
thoughts are possible blueprint of solving the problem. We demonstrate that
Brainstorm significantly enhances the ability of LLMs to solve
competition-level programming problems, resulting in a more than 50% increase
in the pass@$k$ metrics for ChatGPT on the CodeContests benchmark, achieving
state-of-the-art performance. Furthermore, our experiments conducted on
LeetCode contests show that our framework boosts the ability of ChatGPT to a
level comparable to that of human programmers.
Related papers
- Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes.
CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks.
It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z) - No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair [9.562123938545522]
toolname can integrate various code search, generation, and repair tools, combining these three research areas together for the first time.
We conduct preliminary experiments to demonstrate the potential of our framework, eg helping CodeLlama solve 267 programming problems with an improvement of 62.53%.
arXiv Detail & Related papers (2024-09-05T06:24:29Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Evaluation of the Programming Skills of Large Language Models [0.16385815610837165]
Large Language Models (LLM) have revolutionized the efficiency and speed with which tasks are completed.
This paper critically examines the output quality of two leading LLMs, OpenAI's ChatGPT and Google's Gemini AI, by comparing the quality of programming code generated in both their free versions.
arXiv Detail & Related papers (2024-05-23T10:04:36Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Large Language Models Should Ask Clarifying Questions to Increase
Confidence in Generated Code [0.7252027234425334]
Large language models (LLMs) have significantly improved the ability to perform tasks in the field of code generation.
There is still a gap between LLMs being capable coders and being top-tier software engineers.
I propose a communication-centered process that uses an LLM-generated communicator to identify issues with high ambiguity or low confidence in problem descriptions and generated code.
arXiv Detail & Related papers (2023-08-25T17:33:05Z) - No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT [28.68768157452352]
This study examines the quality of code generation using ChatGPT.
We leverage 728 algorithm problems in five languages (i.e., C, C++, Java, Python, and JavaScript) and 18 CWEs with 54 code scenarios for the code generation task.
Our findings uncover potential issues and limitations that arise in the ChatGPT-based code generation.
arXiv Detail & Related papers (2023-08-09T10:01:09Z) - Improving ChatGPT Prompt for Code Generation [13.303599826870705]
OpenAI's language model ChatGPT has emerged as a powerful tool for generating human-like responses to a wide range of textual inputs.
We evaluate ChatGPT's capabilities for two code generation tasks, including text-to-code and code-to-code generation.
Our results showed that by carefully designing prompts to guide ChatGPT, the generation performance can be improved substantially.
arXiv Detail & Related papers (2023-05-15T05:37:33Z) - Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning.
In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.