Enhancing Neural Theorem Proving through Data Augmentation and Dynamic
Sampling Method
- URL: http://arxiv.org/abs/2312.14188v2
- Date: Thu, 15 Feb 2024 13:21:44 GMT
- Title: Enhancing Neural Theorem Proving through Data Augmentation and Dynamic
Sampling Method
- Authors: Rahul Vishwakarma and Subhankar Mishra
- Abstract summary: We introduce DS-Prover, a novel dynamic sampling method for theorem proving.
We augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises.
We achieve a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F.
- Score: 1.8130068086063336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Theorem proving is a fundamental task in mathematics. With the advent of
large language models (LLMs) and interactive theorem provers (ITPs) like Lean,
there has been growing interest in integrating LLMs and ITPs to automate
theorem proving. In this approach, the LLM generates proof steps (tactics), and
the ITP checks the applicability of the tactics at the current goal. The two
systems work together to complete the proof. In this paper, we introduce
DS-Prover, a novel dynamic sampling method for theorem proving. This method
dynamically determines the number of tactics to apply to expand the current
goal, taking into account the remaining time compared to the total allocated
time for proving a theorem. This makes the proof search process more efficient
by adjusting the balance between exploration and exploitation as time passes.
We also augment the training dataset by decomposing simplification and rewrite
tactics with multiple premises into tactics with single premises. This gives
the model more examples to learn from and helps it to predict the tactics with
premises more accurately. We perform our experiments using the Mathlib dataset
of the Lean theorem prover and report the performance on two standard datasets,
MiniF2F and ProofNet. Our methods achieve significant performance gains on both
datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the
ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the
best-reported Pass@1 of 29.6% using Lean.
Related papers
- Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - SubgoalXL: Subgoal-based Expert Learning for Theorem Proving [37.115856591703974]
SubgoalXL is a novel approach that synergizes subgoal-based proofs with expert learning to enhance formal theorem proving.
SubgoalXL achieves a new state-of-the-art performance of 56.1% in Isabelle on the standard miniF2F dataset.
arXiv Detail & Related papers (2024-08-20T20:10:53Z) - Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner.
Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level.
We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z) - Towards Large Language Models as Copilots for Theorem Proving in Lean [81.94024084598598]
We introduce Lean Copilot, a framework for running Lean inference in large language models.
We build tools for suggesting proof steps, completing intermediate proof goals, and selecting relevant premises.
Experimental results demonstrate the effectiveness of our method in assisting humans and theorem proving process.
arXiv Detail & Related papers (2024-04-18T22:54:08Z) - Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving [41.23045212775232]
We demonstrate the benefit of training models that additionally learn from failed search paths.
Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems.
We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
arXiv Detail & Related papers (2024-04-10T23:01:45Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z) - Proof Artifact Co-training for Theorem Proving with Language Models [4.934817254755007]
PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training.
We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
arXiv Detail & Related papers (2021-02-11T18:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.