Interactive Code Generation via Test-Driven User-Intent Formalization
- URL: http://arxiv.org/abs/2208.05950v2
- Date: Wed, 4 Oct 2023 01:53:15 GMT
- Title: Interactive Code Generation via Test-Driven User-Intent Formalization
- Authors: Shuvendu K. Lahiri and Sarah Fakhoury and Aaditya Naik and Georgios
Sakkas and Saikat Chakraborty and Madanlal Musuvathi and Piali Choudhury and
Curtis von Veh and Jeevana Priya Inala and Chenglong Wang and Jianfeng Gao
- Abstract summary: Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
- Score: 60.90035204567797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown great potential in automating
significant aspects of coding by producing natural code from informal natural
language (NL) intent. However, when interacting with LLMs, users have no
guarantees that the code suggestions produced correctly satisfy the intent they
provided. In fact, it is hard to define a notion of correctness since natural
language can be ambiguous and lacks a formal semantics.
In this paper, we propose the workflow of {\it interactive test-driven code
generation}, which leverages lightweight user feedback to (a) formalize the
user intent using generated tests that can be useful for debugging, and (b)
produce an improved set of code suggestions by pruning and ranking candidate
code suggestions. We describe a language-agnostic abstract algorithm and a
concrete implementation TiCoder. We perform an automated evaluation of TiCoder
on the \emph{MBPP} and \emph{HumanEval} code generation benchmarks. Our results
are promising with using the OpenAI Codex LLM: our best algorithm improves the
\passk{1} code generation accuracy (in absolute percentages) between $22.49\%$
to $37.71\%$ for MBPP and between $24.79\%$ to $53.98\%$ for HumanEval using
between 1 to 5 simulated user queries.
Related papers
- LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation [13.800675921118348]
We propose a novel interactive workflow TiCoder for guided intent clarification.
We present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy.
We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions.
arXiv Detail & Related papers (2024-04-15T19:16:32Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - JumpCoder: Go Beyond Autoregressive Coder via Online Modification [18.9350072969148]
We introduce JumpCoder, a novel model-agnostic framework that enables human-like online modification and non-sequential generation to augment code LLMs.
The key idea behind JumpCoder is to insert new code into the currently generated code when necessary during generation, which is achieved through an auxiliary infilling model.
arXiv Detail & Related papers (2024-01-15T18:04:29Z) - Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for
Code Generation [22.219645213202178]
This paper proposes the "Semantic Chain-of-Thought" approach to intruduce semantic information of code, named SeCoT.
We show that SeCoT can achieves state-of-the-art performance, greatly improving the potential for large models and code generation.
arXiv Detail & Related papers (2023-10-16T05:09:58Z) - CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM.
It is specifically designed for code-related tasks with both English and Chinese prompts.
CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z) - Prompting with Pseudo-Code Instructions [12.166296720125187]
We create a dataset of pseudo-code prompts for 132 different tasks spanning classification, QA and generative language tasks.
Using these prompts along with their counterparts in natural language, we study their performance on two LLM families - BLOOM and CodeGen.
Our experiments show that using pseudo-code instructions leads to better results, with an average increase (absolute) of 7-16 points in F1 scores for classification tasks and an improvement (relative) of 12-38% in aggregate ROUGE-L scores across all tasks.
arXiv Detail & Related papers (2023-05-19T16:25:01Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Automatic Code Generation using Pre-Trained Language Models [0.0]
We propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models.
We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46% over a reasonable sequence-to-sequence baseline.
arXiv Detail & Related papers (2021-02-21T07:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.