A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation
- URL: http://arxiv.org/abs/2502.17840v1
- Date: Tue, 25 Feb 2025 04:41:49 GMT
- Title: A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation
- Authors: Beibei Xiong, Hangyu Lv, Haojia Shan, Jianlin Wang, Zhengfeng Yang, Lihong Zhi,
- Abstract summary: Combinatorics is a cornerstone of mathematics, providing essential tools for analyzing discrete structures and solving problems.<n>To address this, we manually construct LeanComb, an automated identities benchmark in Lean.<n>We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for prediction.
- Score: 3.003569769097376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have significantly advanced formal theorem proving, yet the scarcity of high-quality training data constrains their capabilities in complex mathematical domains. Combinatorics, a cornerstone of mathematics, provides essential tools for analyzing discrete structures and solving optimization problems. However, its inherent complexity makes it particularly challenging for automated theorem proving (ATP) for combinatorial identities. To address this, we manually construct LeanComb, combinatorial identities benchmark in Lean, which is, to our knowledge, the first formalized theorem proving benchmark built for combinatorial identities. We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for tactic prediction. By utilizing ATG4CI, we generate a LeanComb-Enhanced dataset comprising 260K combinatorial identities theorems, each with a complete formal proof in Lean, and experimental evaluations demonstrate that models trained on this dataset can generate more effective tactics, thereby improving success rates in automated theorem proving for combinatorial identities.
Related papers
- Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.<n>Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving.
LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge.
It generates formal proofs for 155 theorems across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z) - TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative
Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas.
We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system.
We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z) - Learning to Prove Trigonometric Identities [36.56548303496931]
We construct an automatic proof system for trigonometric identities.
Our goal is not only to complete the proof, but to complete the proof in as few steps as possible.
After further improvement through reinforcement learning, we get AutoTrig, which can give proof steps for identities in almost as short steps as BFS.
arXiv Detail & Related papers (2022-07-14T06:16:17Z) - Training a First-Order Theorem Prover from Synthetic Data [50.23600875138756]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms.
Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps.
arXiv Detail & Related papers (2021-03-05T17:01:34Z) - Proof Artifact Co-training for Theorem Proving with Language Models [4.934817254755007]
PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training.
We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
arXiv Detail & Related papers (2021-02-11T18:59:24Z) - Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models.
We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z) - INT: An Inequality Benchmark for Evaluating Generalization in Theorem
Proving [36.194330645092634]
In this paper, we introduce an INequality Theorem proving benchmark, specifically designed to test agents' generalization ability.
Int is based on a procedure for generating theorems and proofs; this procedure's knobs allow us to measure 6 different types of generalization.
We then evaluate the same agents augmented with Monte Carlo Tree Search (MCTS) at test time, and show that MCTS can help to prove new theorems.
arXiv Detail & Related papers (2020-07-06T17:55:33Z) - Learning to Prove from Synthetic Theorems [41.74768503409581]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data.
We propose an approach that relies on training with synthetic theorems, generated from a set of axioms.
We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems.
arXiv Detail & Related papers (2020-06-19T17:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.