Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
- URL: http://arxiv.org/abs/2510.12787v2
- Date: Thu, 16 Oct 2025 14:10:07 GMT
- Title: Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
- Authors: Marco Del Tredici, Jacob McCarran, Benjamin Breen, Javier Aspuru Mijares, Weichen Winston Yin, Jacob M. Taylor, Frank H. L. Koppens, Dirk Englund,
- Abstract summary: Ax-Prover is a multi-agent system for automated theorem proving in Lean.<n>It can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts.<n>We benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory.
- Score: 1.2978846076301875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperforms them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.
Related papers
- Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z) - Advancing Mathematical Research via Human-AI Interactive Theorem Proving [16.40852561664514]
We propose a human-in-the-loop workflow for interactive theorem proving and discovery with LLMs.<n>Human experts retain control over problem formulation and admissible assumptions, while the model searches for proofs or contradictions.<n>We instantiate this workflow in a case study on the connection between manifold optimization and Grover's quantum search algorithm.
arXiv Detail & Related papers (2025-12-10T09:16:27Z) - Solving Formal Math Problems by Decomposition and Iterative Reflection [30.54275542622631]
textbfDelta Prover orchestrates the interaction between a general-purpose LLM and the Lean 4 proof environment.<n>bftextDelta Prover achieves a state-of-the-art 95.9% success rate on the miniF2F-test benchmark.
arXiv Detail & Related papers (2025-07-21T03:56:35Z) - Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs [11.87831709160905]
We present Prover Agent, a novel AI agent for automated theorem proving.<n>It integrates large language models (LLMs) with a formal proof assistant, Lean.<n>It achieves an 88.1% success rate on the MiniF2F benchmark.
arXiv Detail & Related papers (2025-06-24T18:01:52Z) - From Axioms to Algorithms: Mechanized Proofs of the vNM Utility Theorem [0.0]
We implement the classical axioms of preference-completeness, transitivity, continuity, and independence.<n>Our formalization captures the mathematical structure of preference relations over lotteries.<n>This formalization provides a rigorous foundation for applications in economic modeling, AI alignment, and management decision systems.
arXiv Detail & Related papers (2025-06-08T10:09:54Z) - DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning [67.93945726549289]
DeepTheorem is a comprehensive informal theorem-proving framework exploiting natural language to enhance mathematical reasoning.<n>DeepTheorem includes a large-scale benchmark dataset consisting of 121K high-quality IMO-level informal theorems and proofs.<n>We devise a novel reinforcement learning strategy (RL-Zero) explicitly tailored to informal theorem proving, leveraging the verified theorem variants to incentivize robust mathematical inference.
arXiv Detail & Related papers (2025-05-29T17:59:39Z) - One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [75.95179490687018]
Leveraging mathematical Large Language Models for proof generation is a fundamental topic in LLMs research.<n>We argue that the ability of current LLMs to prove statements largely depends on whether they have encountered the relevant proof process during training.<n>Inspired by the pedagogical method of "proof by counterexamples" commonly used in human mathematics education, our work aims to enhance LLMs' ability to conduct mathematical reasoning and proof through counterexamples.
arXiv Detail & Related papers (2025-02-12T02:01:10Z) - LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving.<n>LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.<n>It generates formal proofs for 155 theorems across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z) - Formal Mathematics Statement Curriculum Learning [64.45821687940946]
We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only.
We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems.
arXiv Detail & Related papers (2022-02-03T00:17:00Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.