GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements
- URL: http://arxiv.org/abs/2312.08189v2
- Date: Fri, 3 May 2024 07:39:46 GMT
- Title: GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements
- Authors: Mrigank Pawagi, Viraj Kumar,
- Abstract summary: Before a function, programmers are encouraged to write a purpose statement i.e., a short, natural-language explanation of what the function computes.
A purpose statement may be ambiguous i.e., it may fail to specify the intended behaviour when two or more inequivalent computations are plausible on certain inputs.
We propose a novel that suggests such inputs using Large Language Models (LLMs)
We create an open-source implementation of our dataset as an extension to Visual Studio Code for the Python programming language.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Before implementing a function, programmers are encouraged to write a purpose statement i.e., a short, natural-language explanation of what the function computes. A purpose statement may be ambiguous i.e., it may fail to specify the intended behaviour when two or more inequivalent computations are plausible on certain inputs. Our paper makes four contributions. First, we propose a novel heuristic that suggests such inputs using Large Language Models (LLMs). Using these suggestions, the programmer may choose to clarify the purpose statement (e.g., by providing a functional example that specifies the intended behaviour on such an input). Second, to assess the quality of inputs suggested by our heuristic, and to facilitate future research, we create an open dataset of purpose statements with known ambiguities. Third, we compare our heuristic against GitHub Copilot's Chat feature, which can suggest similar inputs when prompted to generate unit tests. Fourth, we provide an open-source implementation of our heuristic as an extension to Visual Studio Code for the Python programming language, where purpose statements and functional examples are specified as docstrings and doctests respectively. We believe that this tool will be particularly helpful to novice programmers and instructors.
Related papers
- Chat-like Asserts Prediction with the Support of Large Language Model [34.140962210930624]
We introduce Chat-like execution-based Asserts Prediction (tool) for generating meaningful assert statements for Python projects.
tool utilizes the persona, Chain-of-Thought, and one-shot learning techniques in the prompt design, and conducts rounds of communication with LLM and Python interpreter.
Our evaluation demonstrates that tool achieves 64.7% accuracy for single assert statement generation and 62% for overall assert statement generation.
arXiv Detail & Related papers (2024-07-31T08:27:03Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Code Representation Pre-training with Complements from Program
Executions [29.148208436656216]
We propose FuzzPretrain to explore the dynamic information of programs revealed by their test cases and embed it into the feature representations of code as complements.
FuzzyPretrain yielded more than 6%/9% mAP improvements on code search over its counterparts trained with only source code or AST.
arXiv Detail & Related papers (2023-09-04T01:57:22Z) - Understanding Programs by Exploiting (Fuzzing) Test Cases [26.8259045248779]
We propose to incorporate the relationship between inputs and possible outputs/behaviors into learning, for achieving a deeper semantic understanding of programs.
To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning.
The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins.
arXiv Detail & Related papers (2023-05-23T01:51:46Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs.
Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - Non-Programmers Can Label Programs Indirectly via Active Examples: A
Case Study with Text-to-SQL [61.950839050383514]
APEL is a framework in which non-programmers select among candidate programs generated by a seed semantic (e.g., Codex)
For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs.
It asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct.
arXiv Detail & Related papers (2022-05-25T00:35:12Z) - Natural Language-guided Programming [1.3955252961896318]
We put forward a vision based on a new breed of developer tools that have the potential to largely automate this process.
Key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next.
We call this practice of enriching the code with natural language intent to facilitate its completion natural language-guided programming.
arXiv Detail & Related papers (2021-08-11T13:06:33Z) - Latent Programmer: Discrete Latent Codes for Program Synthesis [56.37993487589351]
In many sequence learning tasks, such as program synthesis and document summarization, a key problem is searching over a large space of possible output sequences.
We propose to learn representations of the outputs that are specifically meant for search: rich enough to specify the desired output but compact enough to make search more efficient.
We introduce the emphLatent Programmer, a program synthesis method that first predicts a discrete latent code from input/output examples, and then generates the program in the target language.
arXiv Detail & Related papers (2020-12-01T10:11:35Z) - IReEn: Reverse-Engineering of Black-Box Functions via Iterative Neural
Program Synthesis [70.61283188380689]
We investigate the problem of revealing the functionality of a black-box agent.
We do not rely on privileged information on the black box, but rather investigate the problem under a weaker assumption of having only access to inputs and outputs of the program.
Our results show that the proposed approach outperforms the state-of-the-art on this challenge by finding an approximately functional equivalent program in 78% of cases.
arXiv Detail & Related papers (2020-06-18T17:50:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.