Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
- URL: http://arxiv.org/abs/2310.02304v2
- Date: Fri, 1 Mar 2024 17:11:21 GMT
- Title: Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
- Authors: Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai
- Abstract summary: We use a language-model-infused scaffolding program to improve itself.
A variety of self-improvement strategies are proposed by the language model.
It demonstrates that a modern language model, GPT-4, is capable of writing code that can call itself to improve itself.
- Score: 25.474639218436916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent advances in AI systems (e.g., Tree-of-Thoughts and
Program-Aided Language Models) solve problems by providing a "scaffolding"
program that structures multiple calls to language models to generate better
outputs. A scaffolding program is written in a programming language such as
Python. In this work, we use a language-model-infused scaffolding program to
improve itself. We start with a seed "improver" that improves an input program
according to a given utility function by querying a language model several
times and returning the best solution. We then run this seed improver to
improve itself. Across a small set of downstream tasks, the resulting improved
improver generates programs with significantly better performance than its seed
improver. A variety of self-improvement strategies are proposed by the language
model, including beam search, genetic algorithms, and simulated annealing.
Since the language models themselves are not altered, this is not full
recursive self-improvement. Nonetheless, it demonstrates that a modern language
model, GPT-4 in our experiments, is capable of writing code that can call
itself to improve itself. We consider concerns around the development of
self-improving technologies and evaluate the frequency with which the generated
code bypasses a sandbox.
Related papers
- A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs.
Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z) - Language Models Can Teach Themselves to Program Better [4.627023679353507]
Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems.
We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter.
The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions.
arXiv Detail & Related papers (2022-07-29T06:43:28Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Searching for More Efficient Dynamic Programs [61.79535031840558]
We describe a set of program transformations, a simple metric for assessing the efficiency of a transformed program, and a search procedure to improve this metric.
We show that in practice, automated search can find substantial improvements to the initial program.
arXiv Detail & Related papers (2021-09-14T20:52:55Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - Automated Source Code Generation and Auto-completion Using Deep
Learning: Comparing and Discussing Current Language-Model-Related Approaches [0.0]
This paper compares different deep learning architectures to create and use language models based on programming code.
We discuss each approach's different strengths and weaknesses and what gaps we find to evaluate the language models or apply them in a real programming context.
arXiv Detail & Related papers (2020-09-16T15:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.