Enhancing Neural Theorem Proving through Data Augmentation and Dynamic
Sampling Method
- URL: http://arxiv.org/abs/2312.14188v2
- Date: Thu, 15 Feb 2024 13:21:44 GMT
- Title: Enhancing Neural Theorem Proving through Data Augmentation and Dynamic
Sampling Method
- Authors: Rahul Vishwakarma and Subhankar Mishra
- Abstract summary: We introduce DS-Prover, a novel dynamic sampling method for theorem proving.
We augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises.
We achieve a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F.
- Score: 1.8130068086063336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Theorem proving is a fundamental task in mathematics. With the advent of
large language models (LLMs) and interactive theorem provers (ITPs) like Lean,
there has been growing interest in integrating LLMs and ITPs to automate
theorem proving. In this approach, the LLM generates proof steps (tactics), and
the ITP checks the applicability of the tactics at the current goal. The two
systems work together to complete the proof. In this paper, we introduce
DS-Prover, a novel dynamic sampling method for theorem proving. This method
dynamically determines the number of tactics to apply to expand the current
goal, taking into account the remaining time compared to the total allocated
time for proving a theorem. This makes the proof search process more efficient
by adjusting the balance between exploration and exploitation as time passes.
We also augment the training dataset by decomposing simplification and rewrite
tactics with multiple premises into tactics with single premises. This gives
the model more examples to learn from and helps it to predict the tactics with
premises more accurately. We perform our experiments using the Mathlib dataset
of the Lean theorem prover and report the performance on two standard datasets,
MiniF2F and ProofNet. Our methods achieve significant performance gains on both
datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the
ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the
best-reported Pass@1 of 29.6% using Lean.
Related papers
- Psychometric-Based Evaluation for Theorem Proving with Large Language Models [14.101509169786349]
This study proposes a psychometric-based evaluation method for theorem proving with large language models (LLMs)
We propose a metric calculation method to annotate the dataset with difficulty and discrimination metrics. Specifically, we annotate each theorem in the miniF2F dataset and grade them into varying difficulty levels according to the performance of LLMs.
We design an adaptive evaluation method to dynamically select the most suitable theorems for testing based on the annotated metrics and the real-time performance of LLMs.
arXiv Detail & Related papers (2025-02-02T17:00:22Z) - STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving [33.61458249318183]
Self-play Theorem Prover (STP) takes on two roles, conjecturer and prover.
STP simultaneously takes on two roles, conjecturer and prover.
We evaluate with both Lean and Isabelle formal versifiers.
arXiv Detail & Related papers (2025-01-31T23:01:48Z) - Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.
Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - SubgoalXL: Subgoal-based Expert Learning for Theorem Proving [37.115856591703974]
SubgoalXL is a novel approach that synergizes subgoal-based proofs with expert learning to enhance formal theorem proving.
SubgoalXL achieves a new state-of-the-art performance of 56.1% in Isabelle on the standard miniF2F dataset.
arXiv Detail & Related papers (2024-08-20T20:10:53Z) - Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner.
Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level.
We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z) - Towards Large Language Models as Copilots for Theorem Proving in Lean [81.94024084598598]
We introduce Lean Copilot, a framework for running Lean inference in large language models.
We build tools for suggesting proof steps, completing intermediate proof goals, and selecting relevant premises.
Experimental results demonstrate the effectiveness of our method in assisting humans and theorem proving process.
arXiv Detail & Related papers (2024-04-18T22:54:08Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.