Related papers: Dissect-and-Restore: AI-based Code Verification with Transient Refactoring

Dissect-and-Restore: AI-based Code Verification with Transient Refactoring

URL: http://arxiv.org/abs/2510.25406v2
Date: Thu, 30 Oct 2025 10:03:34 GMT
Title: Dissect-and-Restore: AI-based Code Verification with Transient Refactoring
Authors: Changjie Wang, Mariano Scazzariello, Anoud Alshnakat, Roberto Guanciale, Dejan Kostić, Marco Chiesa,
Abstract summary: We present Prometheus, a novel AI-assisted system that facilitates automated code verification with current AI capabilities.<n>Prometheus guides the proof search through structured decomposition of complex lemmas into smaller, verifiable sub-lemmas.<n>This approach successfully verifies 86% of tasks in our curated dataset, compared to 68% for the baseline.
Score: 1.2883590530210827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Formal verification is increasingly recognized as a critical foundation for building reliable software systems. However, the need for specialized expertise to write precise specifications, navigate complex proof obligations, and learn annotations often makes verification an order of magnitude more expensive than implementation. While modern AI systems can recognize patterns in mathematical proofs and interpret natural language, effectively integrating them into the formal verification process remains an open challenge. We present Prometheus, a novel AI-assisted system that facilitates automated code verification with current AI capabilities in conjunction with modular software engineering principles (e.g., modular refactoring). Our approach begins by decomposing complex program logic, such as nested loops, into smaller, verifiable components. Once verified, these components are recomposed to construct a proof of the original program. This decomposition-recomposition workflow is non-trivial. Prometheus addresses this by guiding the proof search through structured decomposition of complex lemmas into smaller, verifiable sub-lemmas. When automated tools are insufficient, users can provide lightweight natural language guidance to steer the proof process effectively. Our evaluation demonstrates that transiently applying modular restructuring to the code substantially improves the AI's effectiveness in verifying individual components. This approach successfully verifies 86% of tasks in our curated dataset, compared to 68% for the baseline. Gains are more pronounced with increasing specification complexity, improving from 30% to 69%, and when integrating proof outlines for complex programs, from 25% to 87%.

Related papers

Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z)
BRIDGE: Building Representations In Domain Guided Program Verification [67.36686119518441]
BRIDGE decomposes verification into three interconnected domains: Code, Specifications, and Proofs.<n>We show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods.
arXiv Detail & Related papers (2025-11-26T06:39:19Z)
Agentic Program Verification [14.684859166069012]
We present a first Large Language Models agent, AutoRocq, for conducting program verification.<n>Unlike past works, which rely on extensive training of LLMs on proof examples, our agent learns on-the-fly and improves the proof via an iterative refinement loop.<n>In this way, our proof construction involves autonomous collaboration between the proof agent and the theorem prover.
arXiv Detail & Related papers (2025-11-21T15:51:48Z)
Detecting the Root Cause Code Lines in Bug-Fixing Commits by Heterogeneous Graph Learning [1.5213722322518697]
Automated defect prediction tools can proactively identify software changes prone to defects within software projects.<n>Existing work in heterogeneous and complex software projects continues to face challenges, such as struggling with heterogeneous commit structures and ignoring cross-line dependencies in code changes.<n>We propose an approach called RC_Detector, which consists of three main components: the bug-fixing graph construction component, the code semantic aggregation component, and the cross-line semantic retention component.
arXiv Detail & Related papers (2025-05-02T05:39:50Z)
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries [5.227446378450704]
APE-Bench I is the first realistic benchmark built from real-world commit histories of Mathlib4.<n>Eleanstic is a scalable parallel verification infrastructure optimized for proof checking across multiple versions of Mathlib.
arXiv Detail & Related papers (2025-04-27T05:04:02Z)
From Scientific Texts to Verifiable Code: Automating the Process with Transformers [2.536225150399618]
transformers can read research papers that propose algorithms with formal proofs and translate these proofs into verifiable code.<n>We argue that this approach can significantly reduce the barrier to formal verification.
arXiv Detail & Related papers (2025-01-09T14:03:35Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed. In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness. We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z)
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT.<n>Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition.<n> Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.