PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
- URL: http://arxiv.org/abs/2407.11214v1
- Date: Mon, 15 Jul 2024 19:57:15 GMT
- Title: PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
- Authors: George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri,
- Abstract summary: PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition.
All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations.
We use PutnamBench to evaluate several established neural and symbolic theorem-provers.
- Score: 6.971356163199923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.
Related papers
- MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBench is a new benchmark that rigorously assesses the mathematical capabilities of large language models.
MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.
arXiv Detail & Related papers (2024-05-20T17:52:29Z) - Rigor with Machine Learning from Field Theory to the Poincar\'e
Conjecture [0.0]
We discuss techniques for obtaining rigor in the natural sciences with machine learning.
Non-rigorous methods may lead to rigorous results via conjecture generation or verification by reinforcement learning.
One can also imagine building direct bridges between machine learning theory and either mathematics or theoretical physics.
arXiv Detail & Related papers (2024-02-20T19:00:59Z) - Connecting classical finite exchangeability to quantum theory [69.62715388742298]
Exchangeability is a fundamental concept in probability theory and statistics.
We show how a de Finetti-like representation theorem for finitely exchangeable sequences requires a mathematical representation which is formally equivalent to quantum theory.
arXiv Detail & Related papers (2023-06-06T17:15:19Z) - TheoremQA: A Theorem-driven Question Answering dataset [100.39878559382694]
GPT-4's capabilities to solve these problems are unparalleled, achieving an accuracy of 51% with Program-of-Thoughts Prompting.
TheoremQA is curated by domain experts containing 800 high-quality questions covering 350 theorems.
arXiv Detail & Related papers (2023-05-21T17:51:35Z) - Representer Theorems for Metric and Preference Learning: A Geometric
Perspective [0.0]
We obtain a novel representer theorem for the simultaneous task of metric and preference learning.
We show that our framework can be applied to the task of metric learning from triplet comparisons.
In the case of Reproducing Kernel Hilbert Spaces, we demonstrate that the solution to the learning problem can be expressed using kernel terms.
arXiv Detail & Related papers (2023-04-07T16:34:25Z) - Connes implies Tsirelson: a simple proof [91.3755431537592]
We show that the Connes embedding problem implies the synchronous Tsirelson conjecture.
We also give a different construction of Connes' algebra $mathcalRomega$ appearing in the Connes embedding problem.
arXiv Detail & Related papers (2022-09-16T13:59:42Z) - Discussion of Foundation of Mathematics and Quantum Theory [0.0]
We discuss different aspects of classical and finite mathematics.
quantum theory based on a finite ring of characteristic $p$ is more general than standard quantum theory.
arXiv Detail & Related papers (2022-03-09T18:46:57Z) - MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics [0.0]
miniF2F is a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving.
The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO)
We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance.
arXiv Detail & Related papers (2021-08-31T23:21:12Z) - Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover.
Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z) - A refinement of Reznick's Positivstellensatz with applications to
quantum information theory [72.8349503901712]
In Hilbert's 17th problem Artin showed that any positive definite in several variables can be written as the quotient of two sums of squares.
Reznick showed that the denominator in Artin's result can always be chosen as an $N$-th power of the squared norm of the variables.
arXiv Detail & Related papers (2019-09-04T11:46:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.