REFACTOR: Learning to Extract Theorems from Proofs
- URL: http://arxiv.org/abs/2402.17032v1
- Date: Mon, 26 Feb 2024 21:21:30 GMT
- Title: REFACTOR: Learning to Extract Theorems from Proofs
- Authors: Jin Peng Zhou, Yuhuai Wu, Qiyang Li, Roger Grosse
- Abstract summary: We show that REFACTOR can extract 19.6% of the theorems that humans would use to write the proofs.
With newly extracted theorems, we show that the existing MetaMath database can beed.
We also demonstrate that the prover trained on the new-theoremed dataset proves more test theorems and outperforms state-of-the-art baselines.
- Score: 29.44286369265644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human mathematicians are often good at recognizing modular and reusable
theorems that make complex mathematical results within reach. In this paper, we
propose a novel method called theoREm-from-prooF extrACTOR (REFACTOR) for
training neural networks to mimic this ability in formal mathematical theorem
proving. We show on a set of unseen proofs, REFACTOR is able to extract 19.6%
of the theorems that humans would use to write the proofs. When applying the
model to the existing Metamath library, REFACTOR extracted 16 new theorems.
With newly extracted theorems, we show that the existing proofs in the MetaMath
database can be refactored. The new theorems are used very frequently after
refactoring, with an average usage of 733.5 times, and help shorten the proof
lengths. Lastly, we demonstrate that the prover trained on the new-theorem
refactored dataset proves more test theorems and outperforms state-of-the-art
baselines by frequently leveraging a diverse set of newly extracted theorems.
Code can be found at https://github.com/jinpz/refactor.
Related papers
- Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically [29.908878832382523]
This paper focuses on improving LLMs' ability to write proofs in formal languages that permit automated proof verification/evaluation.
We work in a more natural setup where the lemmas that are directly relevant to the theorem are not given to the theorem prover at test time.
We design an RL-based training algorithm that encourages the model to decompose a theorem into lemmas, prove the lemmas, and then prove the theorem by using the lemmas.
arXiv Detail & Related papers (2024-11-04T05:57:40Z) - miniCTX: Neural Theorem Proving with (Long-)Contexts [19.51651334079961]
miniCTX tests a model's ability to prove formal mathematical theorems that depend on new context that is not seen during training.
miniCTX contains theorems sourced from real Lean projects and textbooks, each associated with a context that can span tens of thousands of tokens.
arXiv Detail & Related papers (2024-08-05T20:19:18Z) - Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner.
Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level.
We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z) - ATG: Benchmarking Automated Theorem Generation for Generative Language Models [83.93978859348313]
Humans can develop new theorems to explore broader and more complex mathematical results.
Current generative language models (LMs) have achieved significant improvement in automatically proving theorems.
This paper proposes an Automated Theorem Generation benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems.
arXiv Detail & Related papers (2024-05-05T02:06:37Z) - LEGO-Prover: Neural Theorem Proving with Growing Libraries [86.1191481712352]
We present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving.
By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process.
Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%.
arXiv Detail & Related papers (2023-10-01T12:47:59Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z) - TheoremQA: A Theorem-driven Question Answering dataset [100.39878559382694]
GPT-4's capabilities to solve these problems are unparalleled, achieving an accuracy of 51% with Program-of-Thoughts Prompting.
TheoremQA is curated by domain experts containing 800 high-quality questions covering 350 theorems.
arXiv Detail & Related papers (2023-05-21T17:51:35Z) - Towards a Mathematics Formalisation Assistant using Large Language
Models [5.485439959027125]
We explore the abilities of a large language model (Codex) to help with formalisation in the Lean theorem prover.
Codex is able to formalise short mathematical statements at undergrad level with nearly 75% accuracy for $120$ theorem statements.
We show that with a new prompting strategy Codex can formalise these proofs in natural language with at least one out of twelve Codex completion being easy to repair into a complete proof.
arXiv Detail & Related papers (2022-11-14T16:52:32Z) - Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover.
Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.