Related papers: Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

URL: http://arxiv.org/abs/2404.00762v2
Date: Tue, 2 Apr 2024 05:44:02 GMT
Title: Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification
Authors: Cheng Wen, Jialun Cao, Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, Cong Tian,
Abstract summary: AutoSpec is an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It can be successfully applied to verify the programs in a real-world X509-parser project.
Score: 15.686651364655958
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific types of programs or invariants. Programs involving multiple complicated data types (e.g., arrays, pointers) and code structures (e.g., nested loops, function calls) are often beyond their capabilities. To help bridge this gap, we present AutoSpec, an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It is driven by static analysis and program verification, and is empowered by large language models (LLMs). AutoSpec addresses the practical challenges in three ways: (1) driving \name by static analysis and program verification, LLMs serve as generators to generate candidate specifications, (2) programs are decomposed to direct the attention of LLMs, and (3) candidate specifications are validated in each round to avoid error accumulation during the interaction with LLMs. In this way, AutoSpec can incrementally and iteratively generate satisfiable and adequate specifications. The evaluation shows its effectiveness and usefulness, as it outperforms existing works by successfully verifying 79% of programs through automatic specification synthesis, a significant improvement of 1.592x. It can also be successfully applied to verify the programs in a real-world X509-parser project.

Related papers

Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models. DisCIPL uses a Planner model to generate a task-specific inference program. Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
Extracting Formal Specifications from Documents Using LLMs for Automated Testing [11.129512305353055]
The main approach to defining formal specifications is through manual analysis of software documents. System update further increases the human labor cost to maintain a corresponding formal specification. Recent advances in Large Language Models have demonstrated promising capabilities in natural language understanding.
arXiv Detail & Related papers (2025-04-02T01:58:11Z)
EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [54.354203142828084]
We present the task of equivalence checking as a new way to evaluate the code reasoning abilities of large language models. We introduce EquiBench, a dataset of 2400 program pairs spanning four programming languages and six equivalence categories. Our evaluation of 17 state-of-the-art LLMs shows that OpenAI o3-mini achieves the highest overall accuracy of 78.0%.
arXiv Detail & Related papers (2025-02-18T02:54:25Z)
Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models [2.243213786359577]
ACInv is an Automated Complex program loop Invariant generation tool. It combines static analysis with Large Language Models (LLMs) to generate the proper loop invariants. We conducted experiments on ACInv, which showed that ACInv outperformed previous tools on data sets with data structures.
arXiv Detail & Related papers (2024-12-13T10:36:18Z)
Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages [6.0608817611709735]
We propose a metric for evaluating the quality of specifications for verification-aware languages. We show that our metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark. We also outline formal verification challenges that need to be addressed to apply the technique more widely.
arXiv Detail & Related papers (2024-06-14T06:52:08Z)
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models [20.36964281778921]
SpecGen is a novel technique for formal program specification generation based on Large Language Models. We evaluate SpecGen on two datasets, including the SV-COMP 279 benchmark and a manually constructed dataset.
arXiv Detail & Related papers (2024-01-16T20:13:50Z)
Selene: Pioneering Automated Proof in Software Verification [62.09555413263788]
We introduce Selene, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation.
arXiv Detail & Related papers (2024-01-15T13:08:38Z)
Fine-Tuning Language Models Using Formal Methods Feedback [53.24085794087253]
We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
arXiv Detail & Related papers (2023-10-27T16:24:24Z)
Lemur: Integrating Large Language Models in Automated Program Verification [10.221822902660458]
We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.
arXiv Detail & Related papers (2023-10-07T16:44:53Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates. We present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z)
Automatic Program Instrumentation for Automatic Verification (Extended Technical Report) [0.0]
In deductive verification and software model checking, dealing with certain specification language constructs can be problematic. We propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches. We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation.
arXiv Detail & Related papers (2023-05-26T14:55:35Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
Synthesis of Mathematical programs from Natural Language Specifications [0.0]
Decision problems that are encountered in various business domains can be modeled as mathematical programs, i.e. optimization problems. The process of conducting such modeling often requires the involvement of experts trained in operations research and advanced algorithms. In this work we evaluate the efficacy of employing CodeT5 with data augmentation and post-processing of beams. We observe that with these enhancements CodeT5 base gives an execution accuracy of 0.73 which is significantly better than zero-shot execution accuracy of 0.41 by ChatGPT and 0.36 by Codex.
arXiv Detail & Related papers (2023-03-30T06:10:00Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.