ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
- URL: http://arxiv.org/abs/2510.15700v1
- Date: Fri, 17 Oct 2025 14:45:30 GMT
- Title: ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
- Authors: Alex Gu, Bartosz Piotrowski, Fabian Gloeckle, Kaiyu Yang, Aram H. Markosyan,
- Abstract summary: We introduce Proofr, the first language model trained to simplify Lean proofs without requiring additional human supervision.<n>Provr is trained via expert iteration and reinforcement learning, using Lean to verify simplifications and provide training signal.<n>Provr substantially compresses proofs generated by state-of-the-art RL-trained provers on standard benchmarks.
- Score: 14.748476989228214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural theorem proving has advanced rapidly in the past year, reaching IMO gold-medalist capabilities and producing formal proofs that span thousands of lines. Although such proofs are mechanically verified by formal systems like Lean, their excessive length renders them difficult for humans to comprehend and limits their usefulness for mathematical insight. Proof simplification is therefore a critical bottleneck. Yet, training data for this task is scarce, and existing methods -- mainly agentic scaffolding with off-the-shelf LLMs -- struggle with the extremely long proofs generated by RL-trained provers. We introduce ProofOptimizer, the first language model trained to simplify Lean proofs without requiring additional human supervision. ProofOptimizer is trained via expert iteration and reinforcement learning, using Lean to verify simplifications and provide training signal. At inference time, it operates within an iterative proof-shortening workflow, progressively reducing proof length. Experiments show that ProofOptimizer substantially compresses proofs generated by state-of-the-art RL-trained provers on standard benchmarks, reducing proof length by 87% on miniF2F, 57% on PutnamBench, and 49% on Seed-Prover's IMO 2025 proofs. Beyond conciseness, the simplified proofs check faster in Lean and further improve downstream prover performance when reused as training data for supervised finetuning.
Related papers
- Proof-RM: A Scalable and Generalizable Reward Model for Math Proof [67.53066972145183]
Large Language Models (LLMs) have demonstrated strong math reasoning abilities through Reinforcement Learning with *Verifiable Rewards* (RLVR)<n>Many advanced mathematical problems are proof-based, with no guaranteed way to determine the authenticity of a proof by simple answer matching.<n>To enable automatic verification, a Reward Model (RM) capable of reliably evaluating full proof processes is required.
arXiv Detail & Related papers (2026-02-02T17:42:53Z) - LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction [74.79306773878955]
We introduce LeanProgress, a method that predicts the progress in the proof.<n>Our experiments show that LeanProgress achieves an overall prediction accuracy of 75.1%.
arXiv Detail & Related papers (2025-02-25T07:46:36Z) - Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs [6.65877320351217]
We developed LeanNavigator, a novel method for generating a large-scale dataset of Lean theorems and proofs.<n>We generated 4.7 million theorems totaling 1 billion tokens, surpassing previous datasets by more than an order of magnitude.<n>Using this extensive dataset, we trained an AI model that outperforms the state-of-the-art ReProver model in theorem-proving tasks.
arXiv Detail & Related papers (2025-02-16T06:20:39Z) - Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation [27.60611509339328]
We argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step is always positioned to the left of that proof step.<n>We demonstrate that training is most effective when the proof is in the intuitively sequential order.
arXiv Detail & Related papers (2024-10-30T18:00:04Z) - ImProver: Agent-Based Automated Proof Optimization [18.315243539816464]
We present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean.
We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems.
arXiv Detail & Related papers (2024-10-07T05:14:18Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z) - Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search)
NLProofS learns to generate relevant steps conditioning on the hypothesis.
It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z) - PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs.
Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.