Lean-STaR: Learning to Interleave Thinking and Proving
- URL: http://arxiv.org/abs/2407.10040v3
- Date: Thu, 8 Aug 2024 23:42:46 GMT
- Title: Lean-STaR: Learning to Interleave Thinking and Proving
- Authors: Haohan Lin, Zhiqing Sun, Yiming Yang, Sean Welleck,
- Abstract summary: We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
- Score: 53.923617816215774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the resulting code. We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof, thereby boosting the model's theorem-proving capabilities. Lean-STaR uses retrospective ground-truth tactics to generate synthetic thoughts for training the language model. At inference time, the trained model directly generates the thoughts prior to the prediction of the tactics in each proof step. Building on the self-taught reasoner framework, we then apply expert iteration to further fine-tune the model on the correct proofs it samples and verifies using the Lean solver. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment, significantly outperforming base models ($\boldsymbol{43.4\% \rightarrow 46.3\%,}$ Pass@64). We also analyze the impact of the augmented thoughts on various aspects of the theorem proving process, providing insights into their effectiveness.
Related papers
- Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically [29.908878832382523]
This paper focuses on improving LLMs' ability to write proofs in formal languages that permit automated proof verification/evaluation.
We work in a more natural setup where the lemmas that are directly relevant to the theorem are not given to the theorem prover at test time.
We design an RL-based training algorithm that encourages the model to decompose a theorem into lemmas, prove the lemmas, and then prove the theorem by using the lemmas.
arXiv Detail & Related papers (2024-11-04T05:57:40Z) - Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation [27.60611509339328]
We argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step is always positioned to the left of that proof step.
We demonstrate that training is most effective when the proof is in the intuitively sequential order.
arXiv Detail & Related papers (2024-10-30T18:00:04Z) - Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving [41.23045212775232]
We demonstrate the benefit of training models that additionally learn from failed search paths.
Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems.
We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
arXiv Detail & Related papers (2024-04-10T23:01:45Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z) - Baldur: Whole-Proof Generation and Repair with Large Language Models [8.100054850290507]
We use large language models, trained on natural language text and code and fine-tuned on proofs, to generate whole proofs for theorems at once.
We combine this proof generation model with a fine-tuned repair model to repair generated proofs, further increasing proving power.
We evaluate our method in a prototype, Baldur, and evaluate it on a benchmark of 6,336 Isabelle/HOL theorems and their proofs.
arXiv Detail & Related papers (2023-03-08T22:00:15Z) - NaturalProver: Grounded Mathematical Proof Generation with Language
Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education.
We develop NaturalProver, a language model that generates proofs by conditioning on background references.
NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search)
NLProofS learns to generate relevant steps conditioning on the hypothesis.
It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z) - Proof Artifact Co-training for Theorem Proving with Language Models [4.934817254755007]
PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training.
We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
arXiv Detail & Related papers (2021-02-11T18:59:24Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.