Related papers: Language Models Can Teach Themselves to Program Better

Language Models Can Teach Themselves to Program Better

URL: http://arxiv.org/abs/2207.14502v4
Date: Wed, 12 Apr 2023 16:35:27 GMT
Title: Language Models Can Teach Themselves to Program Better
Authors: Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai
Abstract summary: Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions.
Score: 4.627023679353507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model 'improves itself' using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.

Related papers

Self-Questioning Language Models [51.75087358141567]
We propose an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver.<n>Both the proposer and solver are trained via reinforcement learning.<n>We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces.
arXiv Detail & Related papers (2025-08-05T17:51:33Z)
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval [7.33924106492889]
Existing code generation benchmarks are designed to study Large Language Models' end-to-end performance. We construct PseudoEval, a multilingual code generation benchmark that provides a solution written in pseudocode as input. Our study indicates that problem-solving capability may transfer across programming languages, while language-coding needs more language-specific effort.
arXiv Detail & Related papers (2025-02-26T14:08:17Z)
Effective LLM-Driven Code Generation with Pythoness [0.0]
Pythoness is an embedded domain-specific language for code generation using large language models (LLMs) In Pythoness, developers operate at the level of behavioral specifications when writing functions, classes, or an entire program. We show that Pythoness can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.
arXiv Detail & Related papers (2025-01-03T23:14:46Z)
Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities. Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
The Consensus Game: Language Model Generation via Equilibrium Search [73.51411916625032]
We introduce a new, a training-free, game-theoretic procedure for language model decoding. Our approach casts language model decoding as a regularized imperfect-information sequential signaling game. Applying EQUILIBRIUM-RANKING to LLaMA-7B outperforms the much larger LLaMA-65B and PaLM-540B models.
arXiv Detail & Related papers (2023-10-13T14:27:21Z)
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation [23.31928097405939]
We use a language-model-infused scaffolding program to improve itself. A variety of self-improvement strategies are proposed by the language model. It demonstrates that a modern language model, GPT-4, is capable of writing code that can call itself to improve itself.
arXiv Detail & Related papers (2023-10-03T17:59:32Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs. Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.