Towards Large Language Model Aided Program Refinement
- URL: http://arxiv.org/abs/2406.18616v1
- Date: Wed, 26 Jun 2024 04:29:27 GMT
- Title: Towards Large Language Model Aided Program Refinement
- Authors: Yufan Cai, Zhe Hou, Xiaokun Luan, David Miguel Sanan Baena, Yun Lin, Jun Sun, Jin Song Dong,
- Abstract summary: Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs.
Large language models (LLMs) enable automatic code generations from informal natural language specifications.
We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods.
- Score: 10.089955747110444
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets.
Related papers
- Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities.
The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z) - Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models [54.14602121129874]
We introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data.
AutoIF transforms the validation of instruction-following data quality into code verification.
arXiv Detail & Related papers (2024-06-19T13:29:53Z) - Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages [21.18996339478024]
We introduce emphsynthetic programming elicitation and compilation (SPEAC)
SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.
We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language.
arXiv Detail & Related papers (2024-06-05T22:16:19Z) - SpecTra: Enhancing the Code Translation Ability of Language Models by Generating Multi-Modal Specifications [17.60108067953814]
Large language models (LLMs) are increasingly being used for the task of automated code translation.
We propose SpecTra, a multi-stage approach that uses a novel self-consistency filter to first generate high-quality specifications.
arXiv Detail & Related papers (2024-05-28T20:48:30Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - Grounding Data Science Code Generation with Input-Output Specifications [32.07033683677839]
Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language prompts.
LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O specification.
We propose GIFT4Code, a novel approach for the instruction fine-tuning of LLMs with respect to I/O specifications.
arXiv Detail & Related papers (2024-02-12T21:32:49Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Function-constrained Program Synthesis [12.55507214959886]
Large language models (LLMs) can generate code in real-time by drawing on all code available in a development environment.
Current systems lack effective recovery methods, forcing users to iteratively re-prompt the model with modified prompts until a sufficient solution is reached.
Our method constrains code-generation to an explicit function set and enabling recovery from failed attempts through automatically generated sub-functions.
arXiv Detail & Related papers (2023-11-27T02:55:34Z) - Test-Case-Driven Programming Understanding in Large Language Models for
Better Code Generation [15.166827643436346]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)
It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.
muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.