Related papers: Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

URL: http://arxiv.org/abs/2512.17260v1
Date: Fri, 19 Dec 2025 06:19:55 GMT
Title: Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Authors: Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, Thomas Hanwen Zhu,
Abstract summary: We present textbfSeed-Prover 1.5, a formal theorem-proving model trained via large-scale agentic reinforcement learning.<n>Using our system, we solved textbf11 out of 12 problems from Putnam 2025 within 9 hours.
Score: 38.07918040429112
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

Related papers

PhysProver: Advancing Automatic Theorem Proving for Physics [17.074001092418793]
This paper presents the first approach to enhance formal theorem proving in the physics domain.<n>It is composed of theorems sampled from PhysLean and data generated by a conjecture-based formal data generation pipeline.<n>In the training pipeline, we leverage DeepSeek-Prover-V2-7B, a strong open-source mathematical theorem prover, and apply Reinforcement Learning with Verifiable Rewards (RLVR) to train our model PhysProver.
arXiv Detail & Related papers (2026-01-22T08:05:32Z)
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving [36.20164235042574]
In this work, we propose textbfSeed-Prover, a lemma-style whole-proof reasoning model.<n>To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning.<n>Seed-Prover proves $78.1%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50% on PutnamBench, outperforming the previous state-of-the-art by a large margin.
arXiv Detail & Related papers (2025-07-31T17:00:30Z)
ProofCompass: Enhancing Specialized Provers with LLM Guidance [6.757964026033364]
This paper introduces Proof, a novel hybrid methodology that achieves remarkable computational efficiency.<n>It strategically guides existing specialized prover methods, such as DeepSeek-Prover-v1.5-RL (DSP-v1.5) with a Large Language Model (LLM) without requiring additional model training.<n>On the miniF2F benchmark, Proof demonstrates substantial resource efficiency: it outperforms DSP-v1.5 ($54.9% rightarrow 55.3%$) while using 25 fewer attempts ($3200 rightarrow 128$)
arXiv Detail & Related papers (2025-07-18T19:28:01Z)
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving [48.22540519786074]
Recent studies show that the informal accuracy exceeds 80% while formal success remains below 8% on benchmarks like PutnamBench.<n>We propose a novel framework that decouples high-level reasoning from low-level proof generation.<n>We evaluate our method on a challenging set of post-2000 IMO problems, a problem set on which no prior open-source prover has reported success.
arXiv Detail & Related papers (2025-07-07T22:38:49Z)
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning [12.343823629674368]
We present REAL-Prover, a new open-source stepwise theorem prover for Lean 4.<n>Our prover notably boosts performance on solving college-level mathematics problems.<n>In experiments, our prover using only supervised fine-tune theorem achieves competitive results with a 23.7% success rate.
arXiv Detail & Related papers (2025-05-27T01:26:11Z)
MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving [30.112351299773632]
We propose a comprehensive framework for Lean4 theorem proving to solve this issue.<n>It separates the cognition tasks of general NL for whole-proof generation and error analysis for proof correction.<n>Our framework achieves a **61.07%** accuracy rate on the Lean4 version of the MiniF2F-Test dataset.
arXiv Detail & Related papers (2025-03-05T05:50:31Z)
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction [74.79306773878955]
We introduce LeanProgress, a method that predicts the progress in the proof.<n>Our experiments show that LeanProgress achieves an overall prediction accuracy of 75.1%.
arXiv Detail & Related papers (2025-02-25T07:46:36Z)
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving [72.8626512877667]
We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems.<n>First, we train LLMs to convert natural language math problems from the Numina dataset to equivalent formal statements in Lean 4.<n>Next, we develop a large dataset of formal proofs by training a series of provers.<n>Finally, we obtain the dataset Goedel-Pset-v1-solved, which contains proofs for over 800K statements from Goedel-Pset-v1
arXiv Detail & Related papers (2025-02-11T15:27:35Z)
LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving.<n>LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.<n>It generates formal proofs for 155 theorems across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z)
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity. We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z)
Textbooks Are All You Need II: phi-1.5 technical report [55.6940110946465]
We create a new 1.3 billion parameter model named textbfphi-1.5 with performance on natural language tasks comparable to models 5x larger. textbfphi-1.5 exhibits many of the traits of much larger Large Language Models. We open-source textbfphi-1.5 to promote further research on these urgent topics.
arXiv Detail & Related papers (2023-09-11T14:01:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.