Natural Language-guided Programming
- URL: http://arxiv.org/abs/2108.05198v1
- Date: Wed, 11 Aug 2021 13:06:33 GMT
- Title: Natural Language-guided Programming
- Authors: Geert Heyman, Rafael Huysegems, Pascal Justen, Tom Van Cutsem
- Abstract summary: We put forward a vision based on a new breed of developer tools that have the potential to largely automate this process.
Key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next.
We call this practice of enriching the code with natural language intent to facilitate its completion natural language-guided programming.
- Score: 1.3955252961896318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In today's software world with its cornucopia of reusable software libraries,
when a programmer is faced with a programming task that they suspect can be
completed through the use of a library, they often look for code examples using
a search engine and then manually adapt found examples to their specific
context of use. We put forward a vision based on a new breed of developer tools
that have the potential to largely automate this process. The key idea is to
adapt code autocompletion tools such that they take into account not only the
developer's already-written code but also the intent of the task the developer
is trying to achieve next, formulated in plain natural language. We call this
practice of enriching the code with natural language intent to facilitate its
completion natural language-guided programming.
To show that this idea is feasible we design, implement and benchmark a tool
that solves this problem in the context of a specific domain (data science) and
a specific programming language (Python). Central to the tool is the use of
language models trained on a large corpus of documented code. Our initial
experiments confirm the feasibility of the idea but also make it clear that we
have only scratched the surface of what may become possible in the future. We
end the paper with a comprehensive research agenda to stimulate additional
research in the budding area of natural language-guided programming.
Related papers
- Assured Automatic Programming via Large Language Models [8.006578501857447]
We aim to discover the programmer intent while generating code which conforms to the intent and a proof of this conformity.
Our objective is to achieve consistency between the program, the specification, and the test by refining our understanding of the user intent.
We demonstrate how the unambiguous intent discovered through our approach increases the percentage of verifiable auto-generated programs.
arXiv Detail & Related papers (2024-10-24T07:29:15Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Automatic Generation of Python Programs Using Context-Free Grammars [0.1227734309612871]
TinyPy Generator is a tool that generates random Python programs using a context-free grammar.
Our system uses custom production rules to generate code with different levels of complexity.
TinyPy Generator is useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models.
arXiv Detail & Related papers (2024-03-11T08:25:52Z) - GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization
in Programming Language Understanding [5.9535699822923]
We propose a new benchmark dataset called GenCodeSearchNet (GeCS) to evaluate the programming language understanding capabilities of language models.
As part of the full dataset, we introduce a new, manually curated subset StatCodeSearch that focuses on R, a popular but so far underrepresented programming language.
For evaluation and comparison, we collect several baseline results using fine-tuned BERT-style models and GPT-style large language models.
arXiv Detail & Related papers (2023-11-16T09:35:00Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - What is it like to program with artificial intelligence? [10.343988028594612]
Large language models can generate code to solve a variety of problems expressed in natural language.
This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot.
We explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance.
arXiv Detail & Related papers (2022-08-12T10:48:46Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z) - Lyra: A Benchmark for Turducken-Style Code Generation [15.810088578588028]
In software development, one programming language is often embedded in another.
This paper defines a new code generation task: given a natural language comment, this task aims to generate a program in a base language with an embedded language.
To our knowledge, this is the first turducken-style code generation task.
arXiv Detail & Related papers (2021-08-27T07:22:55Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.