Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
- URL: http://arxiv.org/abs/2412.14063v3
- Date: Tue, 28 Jan 2025 19:56:36 GMT
- Title: Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
- Authors: Kyle Thompson, Nuno Saavedra, Pedro Carrott, Kevin Fisher, Alex Sanchez-Stern, Yuriy Brun, João F. Ferreira, Sorin Lerner, Emily First,
- Abstract summary: We present Rango, a fully automated synthesis proof tool for Coq.
Rango identifies relevant premises and also similar proofs from the current project and uses them during synthesis.
Our evaluation shows that Rango adding relevant to its context leads to a 47% increase in the number of theorems proven.
- Score: 11.115201117143929
- License:
- Abstract: Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and large language models (LLMs). This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the-art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.
Related papers
- Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.
Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z) - Cobblestone: Iterative Automation for Formal Verification [11.445689801392657]
Formal verification using proof assistants, such as Coq, is an effective way of improving software quality, but it is expensive.
Recent research has used machine learning to automatically synthesize proofs, reducing verification effort, but these tools are able to prove only a fraction of the desired software properties.
We introduce Cobblestone, a new proof-synthesis approach that improves on the state of the art by taking advantage of partial progress in proof synthesis attempts.
arXiv Detail & Related papers (2024-10-25T19:25:00Z) - Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a novel framework that overcomes the lack of human-written proof to enable automated proof generation of Rust code.
We demonstrate superior efficiency and precision compared to GPT-4o.
This advancement leads to a significant improvement in performance, achieving a 70.50% accuracy rate in a benchmark crafted by human experts.
arXiv Detail & Related papers (2024-10-21T08:15:45Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning [8.116854714039452]
QEDCartographer is an automated proof-synthesis tool that combines supervised and reinforcement learning.
We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects.
Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
arXiv Detail & Related papers (2024-08-17T16:06:14Z) - CoqPyt: Proof Navigation in Python in the Era of LLMs [5.029445580644576]
This paper presents CoqPyt, a Python tool for interacting with the Coq proof assistant.
CoqPyt improves on other Coq-related tools by providing novel features, such as the extraction of rich premise data.
arXiv Detail & Related papers (2024-05-07T12:50:28Z) - Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming [8.34623776815378]
We curate a dataset of 600K lines of open-source F* programs and proofs.
This dataset includes software used in production systems ranging from Windows and Linux to Python and Firefox.
We investigate the use of AI to synthesize programs and their proofs in F*, with promising results.
arXiv Detail & Related papers (2024-05-03T00:14:33Z) - Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code [0.0]
The Coq proof assistant stands out for its rigorous approach to verifying mathematical assertions and software correctness.
Despite the advances in artificial intelligence and machine learning, the specialized nature of Coq syntax and semantics poses unique challenges for Large Language Models (LLMs)
This dataset, derived from a collection of over 10,000 Coq source files, encompasses a wide array of propositions, proofs, and definitions.
arXiv Detail & Related papers (2024-03-19T10:53:40Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - Zero-shot Fact Verification by Claim Generation [85.27523983027471]
We develop QACG, a framework for training a robust fact verification model.
We use automatically generated claims that can be supported, refuted, or unverifiable from evidence from Wikipedia.
In a zero-shot scenario, QACG improves a RoBERTa model's F1 from 50% to 77%, equivalent in performance to 2K+ manually-curated examples.
arXiv Detail & Related papers (2021-05-31T03:13:52Z) - PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs.
Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.