Related papers: Lemur: Integrating Large Language Models in Automated Program Verification

Lemur: Integrating Large Language Models in Automated Program Verification

URL: http://arxiv.org/abs/2310.04870v5
Date: Thu, 25 Apr 2024 01:16:21 GMT
Title: Lemur: Integrating Large Language Models in Automated Program Verification
Authors: Haoze Wu, Clark Barrett, Nina Narodytska,
Abstract summary: We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.
Score: 10.221822902660458
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.

Related papers

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models [54.14602121129874]
We introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. AutoIF transforms the validation of instruction-following data quality into code verification.
arXiv Detail & Related papers (2024-06-19T13:29:53Z)
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification [15.686651364655958]
AutoSpec is an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It can be successfully applied to verify the programs in a real-world X509-parser project.
arXiv Detail & Related papers (2024-03-31T18:15:49Z)
LLM can Achieve Self-Regulation via Hyperparameter Aware Generation [88.69052513433603]
Large Language Models (LLMs) employ diverse decoding strategies to control the generated text. Are LLMs conscious of the existence of these decoding strategies and capable of regulating themselves? We propose a novel text generation paradigm termed Hyperparameter Aware Generation (HAG)
arXiv Detail & Related papers (2024-02-17T11:18:22Z)
Counting Reward Automata: Sample Efficient Reinforcement Learning Through the Exploitation of Reward Function Structure [13.231546105751015]
We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches.
arXiv Detail & Related papers (2023-12-18T17:20:38Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z)
Automatic Program Instrumentation for Automatic Verification (Extended Technical Report) [0.0]
In deductive verification and software model checking, dealing with certain specification language constructs can be problematic. We propose instrumentation as a unifying verification paradigm that subsumes various existing ad-hoc approaches. We implement our approach in the MonoCera tool, which is tailored to the verification of programs with aggregation.
arXiv Detail & Related papers (2023-05-26T14:55:35Z)
AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts [46.03503882865222]
AutoPrompt is an automated method to create prompts for a diverse set of tasks based on a gradient-guided search. We show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning.
arXiv Detail & Related papers (2020-10-29T22:54:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.