Related papers: A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation

A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation

URL: http://arxiv.org/abs/2502.17840v1
Date: Tue, 25 Feb 2025 04:41:49 GMT
Title: A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation
Authors: Beibei Xiong, Hangyu Lv, Haojia Shan, Jianlin Wang, Zhengfeng Yang, Lihong Zhi,
Abstract summary: Combinatorics is a cornerstone of mathematics, providing essential tools for analyzing discrete structures and solving problems.<n>To address this, we manually construct LeanComb, an automated identities benchmark in Lean.<n>We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for prediction.
Score: 3.003569769097376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have significantly advanced formal theorem proving, yet the scarcity of high-quality training data constrains their capabilities in complex mathematical domains. Combinatorics, a cornerstone of mathematics, provides essential tools for analyzing discrete structures and solving optimization problems. However, its inherent complexity makes it particularly challenging for automated theorem proving (ATP) for combinatorial identities. To address this, we manually construct LeanComb, combinatorial identities benchmark in Lean, which is, to our knowledge, the first formalized theorem proving benchmark built for combinatorial identities. We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for tactic prediction. By utilizing ATG4CI, we generate a LeanComb-Enhanced dataset comprising 260K combinatorial identities theorems, each with a complete formal proof in Lean, and experimental evaluations demonstrate that models trained on this dataset can generate more effective tactics, thereby improving success rates in automated theorem proving for combinatorial identities.

Related papers

Solving Formal Math Problems by Decomposition and Iterative Reflection [30.54275542622631]
textbfDelta Prover orchestrates the interaction between a general-purpose LLM and the Lean 4 proof environment.<n>bftextDelta Prover achieves a state-of-the-art 95.9% success rate on the miniF2F-test benchmark.
arXiv Detail & Related papers (2025-07-21T03:56:35Z)
LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving [6.220998637943786]
We introduce LeanConjecturer, a pipeline for automatically generating university-level mathematical conjectures in Lean 4 using Large Language Models (LLMs)<n>Through iterative generation and evaluation, LeanConjecturer produced 12,289 conjectures from 40 Mathlib seed files, with 3,776 identified as syntactically valid and non-trivial.
arXiv Detail & Related papers (2025-06-27T08:17:18Z)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning [67.93945726549289]
DeepTheorem is a comprehensive informal theorem-proving framework exploiting natural language to enhance mathematical reasoning.<n>DeepTheorem includes a large-scale benchmark dataset consisting of 121K high-quality IMO-level informal theorems and proofs.<n>We devise a novel reinforcement learning strategy (RL-Zero) explicitly tailored to informal theorem proving, leveraging the verified theorem variants to incentivize robust mathematical inference.
arXiv Detail & Related papers (2025-05-29T17:59:39Z)
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions [37.10426226729792]
We introduce the LLMe-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating pattern-driven conjecturing with formal theorem proving.<n>We present ConstructiveBench, a dataset of 3,431 answer-Thought problems in various math competitions with verified Lean formalizations.
arXiv Detail & Related papers (2025-05-24T03:52:25Z)
Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a novel theorem proving method that enjoys superior sample efficiency.<n>Our method is validated on the miniF2F-test benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant.
arXiv Detail & Related papers (2025-01-30T12:37:06Z)
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation. For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it. As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z)
LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving. LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge. It generates formal proofs for 155 theorems across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z)
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z)
Learning to Prove Trigonometric Identities [36.56548303496931]
We construct an automatic proof system for trigonometric identities. Our goal is not only to complete the proof, but to complete the proof in as few steps as possible. After further improvement through reinforcement learning, we get AutoTrig, which can give proof steps for identities in almost as short steps as BFS.
arXiv Detail & Related papers (2022-07-14T06:16:17Z)
Training a First-Order Theorem Prover from Synthetic Data [50.23600875138756]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data. We propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps.
arXiv Detail & Related papers (2021-03-05T17:01:34Z)
Proof Artifact Co-training for Theorem Proving with Language Models [4.934817254755007]
PACT (bf Proof bf Artifact bf Co-bf Training) is a general methodology for extracting self-supervised data from kernel-level proof terms for co-training. We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32% to 48%.
arXiv Detail & Related papers (2021-02-11T18:59:24Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving [36.194330645092634]
In this paper, we introduce an INequality Theorem proving benchmark, specifically designed to test agents' generalization ability. Int is based on a procedure for generating theorems and proofs; this procedure's knobs allow us to measure 6 different types of generalization. We then evaluate the same agents augmented with Monte Carlo Tree Search (MCTS) at test time, and show that MCTS can help to prove new theorems.
arXiv Detail & Related papers (2020-07-06T17:55:33Z)
Learning to Prove from Synthetic Theorems [41.74768503409581]
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data. We propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems.
arXiv Detail & Related papers (2020-06-19T17:48:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.