SelfEvolve: A Code Evolution Framework via Large Language Models
- URL: http://arxiv.org/abs/2306.02907v1
- Date: Mon, 5 Jun 2023 14:12:46 GMT
- Title: SelfEvolve: A Code Evolution Framework via Large Language Models
- Authors: Shuyang Jiang, Yuhao Wang, Yu Wang
- Abstract summary: Large language models (LLMs) have already revolutionized code generation, after being pretrained on publicly available code data.
We propose a novel two-step pipeline, called autoknow, that leverages LLMs as both knowledge providers and self-reflective programmers.
We evaluate autoknowon three code generation datasets, including DS-1000 for data science code, HumanEval for software engineering code, and TransCoder for C++-to-Python translation.
- Score: 5.6607714367826105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have already revolutionized code generation,
after being pretrained on publicly available code data. However, while various
methods have been proposed to augment LLMs with retrieved knowledge and enhance
the quality of code generation, the performance of these retrieval-based
methods is limited by the strength of the retrievers used. In addition, while
LLMs show great emergent ability, they still struggle to produce the correct
code in one turn. To address these challenges, we propose a novel two-step
pipeline, called \autoknow, that leverages LLMs as both knowledge providers and
self-reflective programmers. Unlike retrieval-based methods, \autoknow~obtains
the knowledge from input prompts and generates intermediate code based on the
generated knowledge. After that, \autoknow~asks LLM to act as an expert
programmer to perform debugging for the generated code. This is achieved by
receiving the error message from the interpreter, without requiring special
test cases for correctness verification. We evaluate \autoknow~on three code
generation datasets, including DS-1000 for data science code, HumanEval for
software engineering code, and TransCoder for C++-to-Python translation. Our
empirical experiments show that \autoknow~outperforms strong baselines by a
significant margin on all datasets. We also conduct exhaustive analytical
experiments to validate the effectiveness of the two stages of \autoknow, and
find that both are superior to other prompting-based methods. Further
scalability analysis demonstrates that \autoknow~can be adapted to other more
advanced models, such as GPT-4, and bring consistent efficacy improvement.
Related papers
- Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities.
The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z) - An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation [1.335664823620186]
Large Language Models (LLMs) have recently advanced many applications on software engineering tasks.
CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process.
arXiv Detail & Related papers (2024-08-28T09:19:09Z) - Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models [54.14602121129874]
We introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data.
AutoIF transforms the validation of instruction-following data quality into code verification.
arXiv Detail & Related papers (2024-06-19T13:29:53Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed.
We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer.
We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z) - Grounding Data Science Code Generation with Input-Output Specifications [32.07033683677839]
Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language prompts.
LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O specification.
We propose GIFT4Code, a novel approach for the instruction fine-tuning of LLMs with respect to I/O specifications.
arXiv Detail & Related papers (2024-02-12T21:32:49Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Test-Case-Driven Programming Understanding in Large Language Models for
Better Code Generation [15.166827643436346]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)
It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.
muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for
Code Understanding and Generation [36.47905744758698]
We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.
Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning.
arXiv Detail & Related papers (2021-09-02T12:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.