Proof Artifact Co-training for Theorem Proving with Language Models
- URL: http://arxiv.org/abs/2102.06203v1
- Date: Thu, 11 Feb 2021 18:59:24 GMT
- Title: Proof Artifact Co-training for Theorem Proving with Language Models
- Authors: Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, Stanislas
Polu
- Abstract summary: PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training.
We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
- Score: 4.934817254755007
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Labeled data for imitation learning of theorem proving in large libraries of
formalized mathematics is scarce as such libraries require years of
concentrated effort by human specialists to be built. This is particularly
challenging when applying large Transformer language models to tactic
prediction, because the scaling of performance with respect to model size is
quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT
({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for
extracting abundant self-supervised data from kernel-level proof terms for
co-training alongside the usual tactic prediction objective. We apply this
methodology to Lean, an interactive proof assistant which hosts some of the
most sophisticated formalized mathematics to date. We instrument Lean with a
neural theorem prover driven by a Transformer language model and show that PACT
improves theorem proving success rate on a held-out suite of test theorems from
32\% to 48\%.
Related papers
- Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically [29.908878832382523]
This paper focuses on improving LLMs' ability to write proofs in formal languages that permit automated proof verification/evaluation.
We work in a more natural setup where the lemmas that are directly relevant to the theorem are not given to the theorem prover at test time.
We design an RL-based training algorithm that encourages the model to decompose a theorem into lemmas, prove the lemmas, and then prove the theorem by using the lemmas.
arXiv Detail & Related papers (2024-11-04T05:57:40Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative
Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas.
We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system.
We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Training a First-Order Theorem Prover from Synthetic Data [50.23600875138756]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms.
Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps.
arXiv Detail & Related papers (2021-03-05T17:01:34Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z) - Learning to Prove from Synthetic Theorems [41.74768503409581]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training with synthetic theorems, generated from a set of axioms.
We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems.
arXiv Detail & Related papers (2020-06-19T17:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.