Think Outside the Code: Brainstorming Boosts Large Language Models in
Code Generation
- URL: http://arxiv.org/abs/2305.10679v1
- Date: Thu, 18 May 2023 03:32:54 GMT
- Title: Think Outside the Code: Brainstorming Boosts Large Language Models in
Code Generation
- Authors: Xin-Ye Li, Jiang-Tian Xue, Zheng Xie and Ming Li
- Abstract summary: In this paper, we introduce Brainstorm framework for code generation.
It leverages a brainstorming step that generates and selects diverse thoughts on the problem.
Brainstorm significantly enhances the ability of LLMs to solve competition-level programming problems.
- Score: 9.904734169174356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation aims to automatically generate source code from high-level
task specifications, which can significantly increase productivity of software
engineering. Recently, approaches based on large language models (LLMs) have
shown remarkable code generation abilities on simple tasks. However, generate
code for more complex tasks, such as competition-level problems, remains
challenging. In this paper, we introduce Brainstorm framework for code
generation. It leverages a brainstorming step that generates and selects
diverse thoughts on the problem to facilitate algorithmic reasoning, where the
thoughts are possible blueprint of solving the problem. We demonstrate that
Brainstorm significantly enhances the ability of LLMs to solve
competition-level programming problems, resulting in a more than 50% increase
in the pass@$k$ metrics for ChatGPT on the CodeContests benchmark, achieving
state-of-the-art performance. Furthermore, our experiments conducted on
LeetCode contests show that our framework boosts the ability of ChatGPT to a
level comparable to that of human programmers.
Related papers
- CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Evaluation of the Programming Skills of Large Language Models [0.16385815610837165]
Large Language Models (LLM) have revolutionized the efficiency and speed with which tasks are completed.
This paper critically examines the output quality of two leading LLMs, OpenAI's ChatGPT and Google's Gemini AI, by comparing the quality of programming code generated in both their free versions.
arXiv Detail & Related papers (2024-05-23T10:04:36Z) - CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation [60.799992690487336]
We propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks.
CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - CoLadder: Supporting Programmers with Hierarchical Code Generation in
Multi-Level Abstraction [16.325032481071997]
CoLadder is a system that supports programmers by facilitating hierarchical task decomposition, direct code segment manipulation, and result evaluation.
A user study with 12 experienced programmers showed that CoLadder is effective in helping programmers externalize their problem-solving intentions flexibly.
arXiv Detail & Related papers (2023-10-12T20:07:01Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Large Language Models Should Ask Clarifying Questions to Increase
Confidence in Generated Code [0.7252027234425334]
Large language models (LLMs) have significantly improved the ability to perform tasks in the field of code generation.
There is still a gap between LLMs being capable coders and being top-tier software engineers.
I propose a communication-centered process that uses an LLM-generated communicator to identify issues with high ambiguity or low confidence in problem descriptions and generated code.
arXiv Detail & Related papers (2023-08-25T17:33:05Z) - No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT [28.68768157452352]
This study examines the quality of code generation using ChatGPT.
We leverage 728 algorithm problems in five languages (i.e., C, C++, Java, Python, and JavaScript) and 18 CWEs with 54 code scenarios for the code generation task.
Our findings uncover potential issues and limitations that arise in the ChatGPT-based code generation.
arXiv Detail & Related papers (2023-08-09T10:01:09Z) - Improving ChatGPT Prompt for Code Generation [13.303599826870705]
OpenAI's language model ChatGPT has emerged as a powerful tool for generating human-like responses to a wide range of textual inputs.
We evaluate ChatGPT's capabilities for two code generation tasks, including text-to-code and code-to-code generation.
Our results showed that by carefully designing prompts to guide ChatGPT, the generation performance can be improved substantially.
arXiv Detail & Related papers (2023-05-15T05:37:33Z) - Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning.
In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.