HyperTree Proof Search for Neural Theorem Proving
- URL: http://arxiv.org/abs/2205.11491v1
- Date: Mon, 23 May 2022 17:49:55 GMT
- Title: HyperTree Proof Search for Neural Theorem Proving
- Authors: Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet,
Amaury Hayat, Gabriel Ebner, Aur\'elien Rodriguez, Timoth\'ee Lacroix
- Abstract summary: We propose an online training procedure for a transformer-based automated theorem prover.
Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution.
We show that with HTPS alone, a model trained on annotated proofs manages to prove 65.4% of a held-out set of Metamath theorems, significantly outperforming the previous state of the art of 56.5% by GPT-f.
- Score: 14.677400513932852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an online training procedure for a transformer-based automated
theorem prover. Our approach leverages a new search algorithm, HyperTree Proof
Search (HTPS), inspired by the recent success of AlphaZero. Our model learns
from previous proof searches through online training, allowing it to generalize
to domains far from the training distribution. We report detailed ablations of
our pipeline's main components by studying performance on three environments of
increasing complexity. In particular, we show that with HTPS alone, a model
trained on annotated proofs manages to prove 65.4% of a held-out set of
Metamath theorems, significantly outperforming the previous state of the art of
56.5% by GPT-f. Online training on these unproved theorems increases accuracy
to 82.6%. With a similar computational budget, we improve the state of the art
on the Lean-based miniF2F-curriculum dataset from 31% to 42% proving accuracy.
Related papers
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically [29.908878832382523]
This paper focuses on improving LLMs' ability to write proofs in formal languages that permit automated proof verification/evaluation.
We work in a more natural setup where the lemmas that are directly relevant to the theorem are not given to the theorem prover at test time.
We design an RL-based training algorithm that encourages the model to decompose a theorem into lemmas, prove the lemmas, and then prove the theorem by using the lemmas.
arXiv Detail & Related papers (2024-11-04T05:57:40Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data [65.5290035371111]
We introduce an approach to generate extensive Lean 4 proof data derived from high-school and undergraduate-level mathematical competition problems.
We fine-tune the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs.
Our model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any.
arXiv Detail & Related papers (2024-05-23T09:03:42Z) - LOT: Layer-wise Orthogonal Training on Improving $\ell_2$ Certified
Robustness [14.206377940235091]
Recent studies show that training deep neural networks (DNNs) with Lipschitz constraints are able to enhance adversarial robustness and other model properties such as stability.
We propose a layer-wise orthogonal training method (LOT) to effectively train 1-Lipschitz convolution layers.
We show that LOT significantly outperforms baselines regarding deterministic l2 certified robustness, and scales to deeper neural networks.
arXiv Detail & Related papers (2022-10-20T22:31:26Z) - Training a First-Order Theorem Prover from Synthetic Data [50.23600875138756]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms.
Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps.
arXiv Detail & Related papers (2021-03-05T17:01:34Z) - Proof Artifact Co-training for Theorem Proving with Language Models [4.934817254755007]
PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training.
We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
arXiv Detail & Related papers (2021-02-11T18:59:24Z) - PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs.
Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.