Related papers: TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

URL: http://arxiv.org/abs/2102.09756v1
Date: Fri, 19 Feb 2021 06:08:39 GMT
Title: TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
Authors: Minchao Wu, Michael Norrish, Christian Walder, Amir Dezfouli
Abstract summary: We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. We show that the framework provides comparable performance to that of the approaches that use human experts.
Score: 6.764610878007278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. Unlike previous work, our framework is able to prove theorems both end-to-end and from scratch (i.e., without relying on example proofs from human experts). We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. The agent learns to select promising derivations as well as appropriate tactics within each derivation using deep policy gradients. This structure allows us to introduce a novel backtracking mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart the derivation from promising alternatives. Experimental results show that the framework provides comparable performance to that of the approaches that use human experts, and that it is also capable of proving theorems that it has never seen during training. We further elaborate the role of each component of the framework using ablation studies.

Related papers

CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z)
Partial Label Learning for Automated Theorem Proving [0.0]
We formulate learning guided Automated Theorem Proving as Partial Label Learning.<n>We use the plCoP theorem prover to demonstrate that methods from the Partial Label Learning literature tend to increase the performance of learning assisted theorem provers.
arXiv Detail & Related papers (2025-07-04T05:54:27Z)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning [67.93945726549289]
DeepTheorem is a comprehensive informal theorem-proving framework exploiting natural language to enhance mathematical reasoning.<n>DeepTheorem includes a large-scale benchmark dataset consisting of 121K high-quality IMO-level informal theorems and proofs.<n>We devise a novel reinforcement learning strategy (RL-Zero) explicitly tailored to informal theorem proving, leveraging the verified theorem variants to incentivize robust mathematical inference.
arXiv Detail & Related papers (2025-05-29T17:59:39Z)
Learning Rules Explaining Interactive Theorem Proving Tactic Prediction [5.229806149125529]
We represent the problem as an Inductive Logic Programming (ILP) task. Using the ILP representation we enriched the feature space by encoding additional, computationally expensive properties. We use this enriched feature space to learn rules explaining when a tactic is applicable to a given proof state.
arXiv Detail & Related papers (2024-11-02T09:18:33Z)
Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z)
Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner. Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level. We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z)
A Survey on Deep Learning for Theorem Proving [16.28502772608166]
Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in natural language to rigorous derivations in formal systems. Deep learning, especially the emergence of large language models, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving.
arXiv Detail & Related papers (2024-04-15T17:07:55Z)
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint Solving [0.0]
We propose a framework for completing incomplete conjectures and incomplete proofs. The framework can turn a conjecture with missing assumptions into a proper theorem. Also, the proposed framework can help in completing a proof sketch into a human-readable and machine-checkable proof.
arXiv Detail & Related papers (2024-01-22T12:49:08Z)
Towards a General Framework for Continual Learning with Pre-training [55.88910947643436]
We present a general framework for continual learning of sequentially arrived tasks with the use of pre-training. We decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction. We propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics.
arXiv Detail & Related papers (2023-10-21T02:03:38Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Learning to Guide a Saturation-Based Theorem Prover [9.228237801323042]
TRAIL is a deep learning-based approach to theorem proving that characterizes core elements of saturation-based theorem proving within a neural framework. To the best of our knowledge, TRAIL is the first reinforcement learning-based approach to exceed the performance of a state-of-the-art traditional theorem prover.
arXiv Detail & Related papers (2021-06-07T18:35:57Z)
Learning to Prove from Synthetic Theorems [41.74768503409581]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data. We propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems.
arXiv Detail & Related papers (2020-06-19T17:48:09Z)
Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.