Proving Theorems using Incremental Learning and Hindsight Experience
Replay
- URL: http://arxiv.org/abs/2112.10664v1
- Date: Mon, 20 Dec 2021 16:40:26 GMT
- Title: Proving Theorems using Incremental Learning and Hindsight Experience
Replay
- Authors: Eser Ayg\"un, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu,
Lei M. Zhang, Doina Precup and Shibl Mourad
- Abstract summary: We propose a general incremental learning algorithm for training domain specific provers for first-order logic without equality.
We adapt hindsight experience replay to theorem proving, so as to be able to learn even when no proof can be found.
We show that provers trained this way can match and sometimes surpass state-of-the-art traditional provers on the TPTP dataset.
- Score: 45.277067974919106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional automated theorem provers for first-order logic depend on
speed-optimized search and many handcrafted heuristics that are designed to
work best over a wide range of domains. Machine learning approaches in
literature either depend on these traditional provers to bootstrap themselves
or fall short on reaching comparable performance. In this paper, we propose a
general incremental learning algorithm for training domain specific provers for
first-order logic without equality, based only on a basic given-clause
algorithm, but using a learned clause-scoring function. Clauses are represented
as graphs and presented to transformer networks with spectral features. To
address the sparsity and the initial lack of training data as well as the lack
of a natural curriculum, we adapt hindsight experience replay to theorem
proving, so as to be able to learn even when no proof can be found. We show
that provers trained this way can match and sometimes surpass state-of-the-art
traditional provers on the TPTP dataset in terms of both quantity and quality
of the proofs.
Related papers
- Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving [41.23045212775232]
We demonstrate the benefit of training models that additionally learn from failed search paths.
Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems.
We compare our model trained on relatively short trial-and-error information (TrialMaster) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
arXiv Detail & Related papers (2024-04-10T23:01:45Z) - ReFT: Reasoning with Reinforced Fine-Tuning [9.80361828538909]
We propose a simple yet effective approach called Reinforced Fine-Tuning (ReFT) to enhance the generalizability of learning LLMs for reasoning.
ReFT first warmups the model with SFT, and then employs on-line reinforcement learning, specifically the PPO algorithm in this paper.
Experiments on GSM8K, MathQA, and SVAMP datasets show that ReFT significantly outperforms SFT.
arXiv Detail & Related papers (2024-01-17T04:43:21Z) - Learning to Guide a Saturation-Based Theorem Prover [9.228237801323042]
TRAIL is a deep learning-based approach to theorem proving that characterizes core elements of saturation-based theorem proving within a neural framework.
To the best of our knowledge, TRAIL is the first reinforcement learning-based approach to exceed the performance of a state-of-the-art traditional theorem prover.
arXiv Detail & Related papers (2021-06-07T18:35:57Z) - Training a First-Order Theorem Prover from Synthetic Data [50.23600875138756]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms.
Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps.
arXiv Detail & Related papers (2021-03-05T17:01:34Z) - Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment.
We develop the first multi-agent reinforcement learning technique for temporal logic specifications.
We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z) - Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation.
We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z) - Learning to Prove from Synthetic Theorems [41.74768503409581]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training with synthetic theorems, generated from a set of axioms.
We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems.
arXiv Detail & Related papers (2020-06-19T17:48:09Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z) - An Experimental Study of Formula Embeddings for Automated Theorem
Proving in First-Order Logic [13.633012068936894]
We study and experimentally compare pattern-based embeddings that are applied in current systems with popular graph-based encodings.
Our experiments show that the advantages of simpler encoding schemes in terms of runtime are outdone by more complex graph-based embeddings.
arXiv Detail & Related papers (2020-02-02T16:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.