INT: An Inequality Benchmark for Evaluating Generalization in Theorem
Proving
- URL: http://arxiv.org/abs/2007.02924v2
- Date: Sat, 3 Apr 2021 16:21:59 GMT
- Title: INT: An Inequality Benchmark for Evaluating Generalization in Theorem
Proving
- Authors: Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse
- Abstract summary: In this paper, we introduce an INequality Theorem proving benchmark, specifically designed to test agents' generalization ability.
Int is based on a procedure for generating theorems and proofs; this procedure's knobs allow us to measure 6 different types of generalization.
We then evaluate the same agents augmented with Monte Carlo Tree Search (MCTS) at test time, and show that MCTS can help to prove new theorems.
- Score: 36.194330645092634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In learning-assisted theorem proving, one of the most critical challenges is
to generalize to theorems unlike those seen at training time. In this paper, we
introduce INT, an INequality Theorem proving benchmark, specifically designed
to test agents' generalization ability. INT is based on a procedure for
generating theorems and proofs; this procedure's knobs allow us to measure 6
different types of generalization, each reflecting a distinct challenge
characteristic to automated theorem proving. In addition, unlike prior
benchmarks for learning-assisted theorem proving, INT provides a lightweight
and user-friendly theorem proving environment with fast simulations, conducive
to performing learning-based and search-based research. We introduce
learning-based baselines and evaluate them across 6 dimensions of
generalization with the benchmark. We then evaluate the same agents augmented
with Monte Carlo Tree Search (MCTS) at test time, and show that MCTS can help
to prove new theorems.
Related papers
- Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner.
Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level.
We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z) - ATG: Benchmarking Automated Theorem Generation for Generative Language Models [83.93978859348313]
Humans can develop new theorems to explore broader and more complex mathematical results.
Current generative language models (LMs) have achieved significant improvement in automatically proving theorems.
This paper proposes an Automated Theorem Generation benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems.
arXiv Detail & Related papers (2024-05-05T02:06:37Z) - Representer Theorems for Metric and Preference Learning: A Geometric
Perspective [0.0]
We obtain a novel representer theorem for the simultaneous task of metric and preference learning.
We show that our framework can be applied to the task of metric learning from triplet comparisons.
In the case of Reproducing Kernel Hilbert Spaces, we demonstrate that the solution to the learning problem can be expressed using kernel terms.
arXiv Detail & Related papers (2023-04-07T16:34:25Z) - Learning to Guide a Saturation-Based Theorem Prover [9.228237801323042]
TRAIL is a deep learning-based approach to theorem proving that characterizes core elements of saturation-based theorem proving within a neural framework.
To the best of our knowledge, TRAIL is the first reinforcement learning-based approach to exceed the performance of a state-of-the-art traditional theorem prover.
arXiv Detail & Related papers (2021-06-07T18:35:57Z) - Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover.
Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z) - An Experimental Study of Formula Embeddings for Automated Theorem
Proving in First-Order Logic [13.633012068936894]
We study and experimentally compare pattern-based embeddings that are applied in current systems with popular graph-based encodings.
Our experiments show that the advantages of simpler encoding schemes in terms of runtime are outdone by more complex graph-based embeddings.
arXiv Detail & Related papers (2020-02-02T16:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.