Python Code Generation by Asking Clarification Questions
- URL: http://arxiv.org/abs/2212.09885v2
- Date: Fri, 26 May 2023 16:03:08 GMT
- Title: Python Code Generation by Asking Clarification Questions
- Authors: Haau-Sing Li, Mohsen Mesgar, Andr\'e F. T. Martins, Iryna Gurevych
- Abstract summary: In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
- Score: 57.63906360576212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation from text requires understanding the user's intent from a
natural language description and generating an executable code snippet that
satisfies this intent. While recent pretrained language models demonstrate
remarkable performance for this task, these models fail when the given natural
language description is under-specified. In this work, we introduce a novel and
more realistic setup for this task. We hypothesize that the under-specification
of a natural language description can be resolved by asking clarification
questions. Therefore, we collect and introduce a new dataset named CodeClarQA
containing pairs of natural language descriptions and code with created
synthetic clarification questions and answers. The empirical results of our
evaluation of pretrained language model performance on code generation show
that clarifications result in more precisely generated code, as shown by the
substantial improvement of model performance in all evaluation metrics.
Alongside this, our task and dataset introduce new challenges to the community,
including when and what clarification questions should be asked. Our code and
dataset are available on GitHub.
Related papers
- NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer.
We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z) - Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X.
These datasets cover over 10 programming languages.
We are able to assess the performance of code generation models in a multi-lingual fashion.
arXiv Detail & Related papers (2022-10-26T17:17:06Z) - Benchmarking Language Models for Code Syntax Understanding [79.11525961219591]
Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding.
In this work, we perform the first thorough benchmarking of the state-of-the-art pre-trained models for identifying the syntactic structures of programs.
Our findings point out key limitations of existing pre-training methods for programming languages, and suggest the importance of modeling code syntactic structures.
arXiv Detail & Related papers (2022-10-26T04:47:18Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z) - MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English.
We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian.
While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z) - Can Machines Read Coding Manuals Yet? -- A Benchmark for Building Better
Language Models for Code Understanding [3.98345038769576]
We derive a set of benchmarks that assess code understanding based on tasks such as predicting the best answer to a question in a forum post.
We evaluate the performance of current state-of-the-art language models on these tasks and show that there is a significant improvement on each task from fine tuning.
arXiv Detail & Related papers (2021-09-15T17:42:44Z) - BERT2Code: Can Pretrained Language Models be Leveraged for Code Search? [0.7953229555481884]
We show that our model learns the inherent relationship between the embedding spaces and further probes into the scope of improvement.
In this analysis, we show that the quality of the code embedding model is the bottleneck for our model's performance.
arXiv Detail & Related papers (2021-04-16T10:28:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.