Related papers: When Agda met Vampire

When Agda met Vampire

URL: http://arxiv.org/abs/2602.18844v1
Date: Sat, 21 Feb 2026 14:19:56 GMT
Title: When Agda met Vampire
Authors: Artjoms Šinkarovs, Michael Rawson,
Abstract summary: We aim to integrate proof assistants with automated theorem provers (ATPs) in a simple way.<n>Most ATPs operate in classical first-order logic, whereas these proof assistants are grounded in constructive dependent type theory.<n>We produce a prototype system for Agda proof obligations to the ATP Vampire, then transforming the resulting classical proof into a constructive proof term that Agda can type-check.
Score: 3.373200015661363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dependently-typed proof assistants furnish expressive foundations for mechanised mathematics and verified software. However, automation for these systems has been either modest in scope or complex in implementation. We aim to improve the situation by integrating proof assistants with automated theorem provers (ATPs) in a simple way, while preserving the correctness guarantees of the former. A central difficulty arises from the fact that most ATPs operate in classical first-order logic, whereas these proof assistants are grounded in constructive dependent type theory. We identify an expressive fragment of both languages -- essentially equational Horn -- that admits sound, straightforward translations in both directions. The approach produces a prototype system for Agda forwarding proof obligations to the ATP Vampire, then transforming the resulting classical proof into a constructive proof term that Agda can type-check. The prototype automatically derives proofs concerning the properties of a complex field equipped with roots of unity, which took professional Agda developers two full days to complete. The required engineering effort is modest, and we anticipate that the methodology will extend readily to other ATPs and proof assistants.

Related papers

Gödel's Poetry [0.0]
We introduce a new approach to computer theorem proving, one that employs specialized language models for Lean4 proof generation.<n>We achieve a 90.4% pass rate on miniF2F without decomposition.<n>A key technical contribution lies in our extension of the Kimina Lean Server with abstract syntax tree (AST) parsing capabilities.
arXiv Detail & Related papers (2025-12-16T10:00:01Z)
Agentic Program Verification [14.684859166069012]
We present a first Large Language Models agent, AutoRocq, for conducting program verification.<n>Unlike past works, which rely on extensive training of LLMs on proof examples, our agent learns on-the-fly and improves the proof via an iterative refinement loop.<n>In this way, our proof construction involves autonomous collaboration between the proof agent and the theorem prover.
arXiv Detail & Related papers (2025-11-21T15:51:48Z)
Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z)
Autoformalizer with Tool Feedback [52.334957386319864]
Autoformalization addresses the scarcity of data for Automated Theorem Proving (ATP) by translating mathematical problems from natural language into formal statements.<n>Existing formalizer still struggles to consistently generate valid statements that meet syntactic validity and semantic consistency.<n>We propose the Autoformalizer with Tool Feedback (ATF), a novel approach that incorporates syntactic and consistency information as tools into the formalization process.
arXiv Detail & Related papers (2025-10-08T10:25:12Z)
ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a procedure that equips LLMs with automation methods at various granularities.<n>Our method is validated on the miniF2F benchmark using the open-source deep-math-7b-base model and the Isabelle proof assistant.<n>We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-seek-Distill-1.5B from 44.3% to 50.4%.
arXiv Detail & Related papers (2025-01-30T12:37:06Z)
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity. We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z)
Baldur: Whole-Proof Generation and Repair with Large Language Models [8.100054850290507]
We use large language models, trained on natural language text and code and fine-tuned on proofs, to generate whole proofs for theorems at once. We combine this proof generation model with a fine-tuned repair model to repair generated proofs, further increasing proving power. We evaluate our method in a prototype, Baldur, and evaluate it on a benchmark of 6,336 Isabelle/HOL theorems and their proofs.
arXiv Detail & Related papers (2023-03-08T22:00:15Z)
Neural Unification for Logic Reasoning over Natural Language [0.28675177318965034]
Automated Theorem Proving deals with the development of computer programs being able to show that some conjectures (queries) are a logical consequence of a set of axioms (facts and rules) Recent approaches have proposed transformer-based architectures for deriving conjectures given axioms expressed in natural language (English) In this work we propose a new architecture, namely the Neural Unifier, which achieves state-of-the-art results in term of generalisation.
arXiv Detail & Related papers (2021-09-17T10:48:39Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.