VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models
- URL: http://arxiv.org/abs/2411.19275v1
- Date: Thu, 28 Nov 2024 17:12:21 GMT
- Title: VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models
- Authors: Merlijn Sevenhuijsen, Khashayar Etemadi, Mattias Nyberg,
- Abstract summary: VeCoGen is a novel tool that combines Large Language Models (LLMs) with formal verification to automate the generation of formally verified C programs.
VeCoGen takes a formal specification in ISO/ISO C Specification Language (ACSL), a natural language specification, and a set of test cases to attempt to generate a program.
- Score: 1.6249267147413524
- License:
- Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VeCoGen, a novel tool that combines LLMs with formal verification to automate the generation of formally verified C programs. VeCoGen takes a formal specification in ANSI/ISO C Specification Language (ACSL), a natural language specification, and a set of test cases to attempt to generate a program. This program-generation process consists of two steps. First, VeCoGen generates an initial set of candidate programs. Secondly, the tool iteratively improves on previously generated candidates. If a candidate program meets the formal specification, then we are sure the program is correct. We evaluate VeCoGen on 15 problems presented in Codeforces competitions. On these problems, VeCoGen solves 13 problems. This work shows the potential of combining LLMs with formal verification to automate program generation.
Related papers
- AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement [25.80131224070207]
We aim to use formal verification to provide mathematical guarantees that the generated code is correct.
generating formally verified code with LLMs is hindered by the scarcity of training data and the complexity of formal proofs.
We introduce AlphaVerus, a self-improving framework that bootstraps formally verified code generation.
arXiv Detail & Related papers (2024-12-09T03:22:35Z) - Galapagos: Automated N-Version Programming with LLMs [10.573037638807024]
We propose the automated generation of program variants using large language models.
We design, develop and evaluate Gal'apagos: a tool for generating program variants.
We evaluate Gal'apagos by creating N-Version components of real-world C code.
arXiv Detail & Related papers (2024-08-18T16:44:01Z) - NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer.
We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z) - Towards Large Language Model Aided Program Refinement [10.089955747110444]
Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs.
Large language models (LLMs) enable automatic code generations from informal natural language specifications.
We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods.
arXiv Detail & Related papers (2024-06-26T04:29:27Z) - Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - CryptOpt: Verified Compilation with Randomized Program Search for
Cryptographic Primitives (full version) [12.790826917588575]
cryptography has been an exception, where many performance-critical routines have been written directly in assembly.
We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce.
On the formal-verification side, we connect to the FiatOpt framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker.
arXiv Detail & Related papers (2022-11-19T11:07:39Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.