VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models
- URL: http://arxiv.org/abs/2411.19275v2
- Date: Wed, 12 Mar 2025 09:40:22 GMT
- Title: VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models
- Authors: Merlijn Sevenhuijsen, Khashayar Etemadi, Mattias Nyberg,
- Abstract summary: VECOGEN is a novel tool that combines large language models with formal verification to automate the generation of formally verified C programs.<n>VECOGEN takes a formal specification in/ISO C Specification Language, a natural language specification, and a set of test cases to attempt to generate a verified program.
- Score: 1.6249267147413524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VECOGEN, a novel tool that combines large language models with formal verification to automate the generation of formally verified C programs. VECOGEN takes a formal specification in ANSI/ISO C Specification Language, a natural language specification, and a set of test cases to attempt to generate a verified program. This program-generation process consists of two steps. First, VECOGEN generates an initial set of candidate programs. Secondly, the tool iteratively improves on previously generated candidates. If a candidate program meets the formal specification, then we are sure the program.is correct. We evaluate VECOGEN on 15 problems presented in Codeforces competitions. On these problems, VECOGEN solves 13 problems. This work shows the potential of combining large language models with formal verification to automate program generation.
Related papers
- Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code.
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.
Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - C*: Unifying Programming and Verification in C [15.531046191957529]
C* is a proof-integrated language design for C programming.
It enables real-time verification by allowing programmers to embed proof-code blocks alongside implementation code.
C* unifies implementation and proof code development by using C as the common language.
arXiv Detail & Related papers (2025-04-03T03:22:22Z) - LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages [0.1906498126334485]
This paper evaluates the capabilities of the Llama 2-70B model in automating scientific applications written in programming languages.
We assess the model's capacity to generate code, documentation, and unit tests, as well as its ability to translate existing code between programming languages.
Our results indicate that while Llama 2-70B frequently generates syntactically correct and functional code for simpler numerical tasks, it encounters substantial difficulties with more complex, parallelized, or distributed computations.
arXiv Detail & Related papers (2025-03-24T23:46:14Z) - Galápagos: Automated N-Version Programming with LLMs [10.573037638807024]
Gal'apagos is a tool for generating program variants using large language models.
We show that Gal'apagos can produce program variants that are proven to be functionally equivalent.
We demonstrate that the variants produced by Gal'apagos can protect C code against real miscompilation bugs.
arXiv Detail & Related papers (2024-08-18T16:44:01Z) - NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer.
We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z) - Towards Large Language Model Aided Program Refinement [10.089955747110444]
Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs.
Large language models (LLMs) enable automatic code generations from informal natural language specifications.
We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods.
arXiv Detail & Related papers (2024-06-26T04:29:27Z) - Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - CryptOpt: Verified Compilation with Randomized Program Search for
Cryptographic Primitives (full version) [12.790826917588575]
cryptography has been an exception, where many performance-critical routines have been written directly in assembly.
We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce.
On the formal-verification side, we connect to the FiatOpt framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker.
arXiv Detail & Related papers (2022-11-19T11:07:39Z) - Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs.
Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - A Conversational Paradigm for Program Synthesis [110.94409515865867]
We propose a conversational program synthesis approach via large language models.
We train a family of large language models, called CodeGen, on natural language and programming language data.
Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
arXiv Detail & Related papers (2022-03-25T06:55:15Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.