Related papers: Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?

Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?

URL: http://arxiv.org/abs/2310.01831v2
Date: Mon, 15 Apr 2024 23:08:37 GMT
Title: Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?
Authors: Madeline Endres, Sarah Fakhoury, Saikat Chakraborty, Shuvendu K. Lahiri,
Abstract summary: Large Language Models (LLMs) have the potential to facilitate the translation of natural language intent to specifications that match programmer intent. In this paper we describe nl2postcond, the problem of leveraging LLMs for informal natural language formal method postconditions, expressed as program assertions.
Score: 17.03841665553565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent. However, there is typically no guarantee that a programs implementation and natural language documentation are aligned. In the case of a conflict, leveraging information in code-adjacent natural language has the potential to enhance fault localization, debugging, and code trustworthiness. In practice, however, this information is often underutilized due to the inherent ambiguity of natural language which makes natural language intent challenging to check programmatically. The emergent abilities of Large Language Models (LLMs) have the potential to facilitate the translation of natural language intent to programmatically checkable assertions. However, it is unclear if LLMs can correctly translate informal natural language specifications into formal specifications that match programmer intent. Additionally, it is unclear if such translation could be useful in practice. In this paper, we describe nl2postcond, the problem of leveraging LLMs for transforming informal natural language to formal method postconditions, expressed as program assertions. We introduce and validate metrics to measure and compare different nl2postcond approaches, using the correctness and discriminative power of generated postconditions. We then use qualitative and quantitative methods to assess the quality of nl2postcond postconditions, finding that they are generally correct and able to discriminate incorrect code. Finally, we find that nl2postcond via LLMs has the potential to be helpful in practice; nl2postcond generated postconditions were able to catch 64 real-world historical bugs from Defects4J.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts [17.130884318613944]
We seek to offer formal guarantees of correctness to LLM generated code.<n>We propose to incorporate a formal query language that can represent a user's intent in a formally defined but natural language-like manner.<n>Our verifier is able to verify correct code in 83% of cases and identify incorrect code in 92%.
arXiv Detail & Related papers (2025-07-17T16:54:42Z)
NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging [68.42255321759062]
Recent advancements in large language models (LLMs) have shifted attention toward leveraging natural language reasoning to enhance code-related tasks.<n>In this paper, we introduce NL-GING, a novel framework that employs natural language as an intermediate representation to improve code.
arXiv Detail & Related papers (2025-05-21T10:38:50Z)
Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code. We introduce a type-constrained decoding approach that leverages type systems to guide code generation. Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language [6.0749049701897295]
HoarePrompt is a novel approach that adapts fundamental ideas from program analysis and verification to natural language artifacts. To manage loops, we propose few-shot-driven k-induction, an adaptation of the k-induction method widely used in model checking. Our experiments show that HoarePrompt improves the MCC by 62% compared to directly using Zero-shot-CoT prompts for correctness classification.
arXiv Detail & Related papers (2025-03-25T12:30:30Z)
Dafny as Verification-Aware Intermediate Language for Code Generation [0.0]
Large language models (LLMs) generate source code from natural language prompts. One of its limitations is that the generated code can be faulty at times, despite being presented to the user as correct. We propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny. The correct Dafny program is then compiled to the target language and returned to the user.
arXiv Detail & Related papers (2025-01-10T17:23:14Z)
Assured Automatic Programming via Large Language Models [8.006578501857447]
We aim to discover the programmer intent while generating code which conforms to the intent and a proof of this conformity. Our objective is to achieve consistency between the program, the specification, and the test by refining our understanding of the user intent. We demonstrate how the unambiguous intent discovered through our approach increases the percentage of verifiable auto-generated programs.
arXiv Detail & Related papers (2024-10-24T07:29:15Z)
Towards Large Language Model Aided Program Refinement [10.089955747110444]
Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Large language models (LLMs) enable automatic code generations from informal natural language specifications. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods.
arXiv Detail & Related papers (2024-06-26T04:29:27Z)
Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages [6.0608817611709735]
We propose a metric for evaluating the quality of specifications for verification-aware languages. We show that our metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark. We also outline formal verification challenges that need to be addressed to apply the technique more widely.
arXiv Detail & Related papers (2024-06-14T06:52:08Z)
Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages [21.18996339478024]
We introduce emphsynthetic programming elicitation and compilation (SPEAC) SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language.
arXiv Detail & Related papers (2024-06-05T22:16:19Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)
Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts. We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language. We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions. Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
Code Comment Inconsistency Detection with BERT and Longformer [9.378041196272878]
Comments, or natural language descriptions of source code, are standard practice among software developers. When the code is modified without an accompanying correction to the comment, an inconsistency between the comment and code can arise. We propose two models to detect such inconsistencies in a natural language inference (NLI) context.
arXiv Detail & Related papers (2022-07-29T02:43:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.