PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
- URL: http://arxiv.org/abs/2508.08171v1
- Date: Mon, 11 Aug 2025 16:49:07 GMT
- Title: PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
- Authors: Pedro Orvalho, Marta Kwiatkowska,
- Abstract summary: Python has become the dominant language for general-purpose programming, yet it lacks robust tools for formal verification.<n>We propose PyVeritas, a novel framework that leverages Large Language Models (LLMs) for high-level transpilation from Python to C.
- Score: 18.52519530244078
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Python has become the dominant language for general-purpose programming, yet it lacks robust tools for formal verification. In contrast, programmers working in languages such as C benefit from mature model checkers, for example CBMC, which enable exhaustive symbolic reasoning and fault localisation. The inherent complexity of Python, coupled with the verbosity and low-level nature of existing transpilers (e.g., Cython), have historically limited the applicability of formal verification to Python programs. In this paper, we propose PyVeritas, a novel framework that leverages Large Language Models (LLMs) for high-level transpilation from Python to C, followed by bounded model checking and MaxSAT-based fault localisation in the generated C code. PyVeritas enables verification and bug localisation for Python code using existing model checking tools for C. Our empirical evaluation on two Python benchmarks demonstrates that LLM-based transpilation can achieve a high degree of accuracy, up to 80--90% for some LLMs, enabling effective development environment that supports assertion-based verification and interpretable fault diagnosis for small yet non-trivial Python programs.
Related papers
- A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions [28.01600045250939]
We propose a programming language for code actions called Quasar.<n>LLMs can write code in a subset of Python, which is automatically transpiled to Quasar.<n>LLMs with Quasar actions retain strong performance, while reducing execution time when possible by 42%.
arXiv Detail & Related papers (2025-06-13T20:11:22Z) - Effective LLM-Driven Code Generation with Pythoness [0.0]
Pythoness is an embedded domain-specific language for code generation using large language models (LLMs)<n>In Pythoness, developers operate at the level of behavioral specifications when writing functions, classes, or an entire program.<n>We show that Pythoness can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.
arXiv Detail & Related papers (2025-01-03T23:14:46Z) - PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.<n>PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.<n>We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.1875460416205]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.<n>It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.<n>Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - ESBMC-Python: A Bounded Model Checker for Python Programs [8.980206368890013]
This paper introduces a tool for verifying Python programs.
It transforms an input program into an abstract syntax tree to infer and add type information.
It converts this description into formulae evaluated with satisfiability modulo theories solvers.
arXiv Detail & Related papers (2024-07-03T19:38:14Z) - Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages.
Experimental results reveal that it significantly outperforms Python Self-Consistency.
In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z) - Transactional Python for Durable Machine Learning: Vision, Challenges,
and Feasibility [5.669983975369642]
Python applications may lose important data, such as trained models and extracted features, due to machine failures or human errors.
This paper presents our vision of transactional Python that provides DART without any code modifications to user programs or the Python kernel.
Our evaluation of a proof-of-concept implementation with public PyTorch and scikit-learn applications shows that DART can be offered with overheads ranging 1.5%--15.6%.
arXiv Detail & Related papers (2023-05-15T16:27:09Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - Using Python for Model Inference in Deep Learning [0.6027358520885614]
We show how it is possible to meet performance and packaging constraints while performing inference in Python.
We present a way of using multiple Python interpreters within a single process to achieve scalable inference.
arXiv Detail & Related papers (2021-04-01T04:48:52Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.