Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
Magic?
- URL: http://arxiv.org/abs/2210.14699v1
- Date: Wed, 26 Oct 2022 13:28:14 GMT
- Title: Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
Magic?
- Authors: Jean-Baptiste D\"oderlein, Mathieu Acher, Djamel Eddine Khelladi,
Benoit Combemale
- Abstract summary: We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs.
Our results showed that varying the input parameters can significantly improve the performance of language models.
- Score: 5.714553194279462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models are promising solutions for tackling increasing complex
problems. In software engineering, they recently attracted attention in code
assistants, with programs automatically written in a given programming language
from a programming task description in natural language. They have the
potential to save time and effort when writing code. However, these systems are
currently poorly understood, preventing them from being used optimally. In this
paper, we investigate the various input parameters of two language models, and
conduct a study to understand if variations of these input parameters (e.g.
programming task description and the surrounding context, creativity of the
language model, number of generated solutions) can have a significant impact on
the quality of the generated programs. We design specific operators for varying
input parameters and apply them over two code assistants (Copilot and Codex)
and two benchmarks representing algorithmic problems (HumanEval and LeetCode).
Our results showed that varying the input parameters can significantly improve
the performance of language models. However, there is a tight dependency when
varying the temperature, the prompt and the number of generated solutions,
making potentially hard for developers to properly control the parameters to
obtain an optimal result. This work opens opportunities to propose (automated)
strategies for improving performance.
Related papers
- CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines [0.0]
We introduce a dynamic pipeline that transforms natural language task descriptions into code through high-level data-shaping instructions.
This paper details the fine-tuning process, and sheds light on how natural language descriptions can be translated into functional code.
We propose an algorithm capable of transforming a natural description of an ML task into code with minimal human interaction.
arXiv Detail & Related papers (2024-03-18T08:58:47Z) - Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation [23.31928097405939]
We use a language-model-infused scaffolding program to improve itself.
A variety of self-improvement strategies are proposed by the language model.
It demonstrates that a modern language model, GPT-4, is capable of writing code that can call itself to improve itself.
arXiv Detail & Related papers (2023-10-03T17:59:32Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Language Models Can Teach Themselves to Program Better [4.627023679353507]
Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems.
We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter.
The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions.
arXiv Detail & Related papers (2022-07-29T06:43:28Z) - Automatic Generation of Programming Exercises and Code Explanations with
Large Language Models [4.947560475228859]
OpenAI Codex is a recent large language model from the GPT-3 family for translating code into natural language.
We explore the natural language generation capabilities of Codex in two different phases of the life of a programming exercise.
We find the majority of this automatically generated content both novel and sensible, and in many cases ready to use as is.
arXiv Detail & Related papers (2022-06-03T11:00:43Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - A Conversational Paradigm for Program Synthesis [110.94409515865867]
We propose a conversational program synthesis approach via large language models.
We train a family of large language models, called CodeGen, on natural language and programming language data.
Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
arXiv Detail & Related papers (2022-03-25T06:55:15Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.