Related papers: Discovering New Theorems via LLMs with In-Context Proof Learning in Lean

Discovering New Theorems via LLMs with In-Context Proof Learning in Lean

URL: http://arxiv.org/abs/2509.14274v1
Date: Tue, 16 Sep 2025 06:48:11 GMT
Title: Discovering New Theorems via LLMs with In-Context Proof Learning in Lean
Authors: Kazumi Kasaura, Naoto Onda, Yuta Oriike, Masaya Taniguchi, Akiyoshi Sannai, Sho Sonoda,
Abstract summary: We propose Conjecturing-Proving Loop pipeline for automatically generating mathematical conjectures and proving them in Lean 4 format.<n>A feature of our approach is that we generate and prove further conjectures with context including previously generated theorems and their proofs.<n>At least one of these theorems could not be proved by the LLM without in-context learning, even in natural language.
Score: 6.097030695272925
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models have demonstrated significant promise in formal theorem proving. However, previous works mainly focus on solving existing problems. In this paper, we focus on the ability of LLMs to find novel theorems. We propose Conjecturing-Proving Loop pipeline for automatically generating mathematical conjectures and proving them in Lean 4 format. A feature of our approach is that we generate and prove further conjectures with context including previously generated theorems and their proofs, which enables the generation of more difficult proofs by in-context learning of proof strategies without changing parameters of LLMs. We demonstrated that our framework rediscovered theorems with verification, which were published in past mathematical papers and have not yet formalized. Moreover, at least one of these theorems could not be proved by the LLM without in-context learning, even in natural language, which means that in-context learning was effective for neural theorem proving. The source code is available at https://github.com/auto-res/ConjecturingProvingLoop.

Related papers

LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving [6.220998637943786]
We introduce LeanConjecturer, a pipeline for automatically generating university-level mathematical conjectures in Lean 4 using Large Language Models (LLMs)<n>Through iterative generation and evaluation, LeanConjecturer produced 12,289 conjectures from 40 Mathlib seed files, with 3,776 identified as syntactically valid and non-trivial.
arXiv Detail & Related papers (2025-06-27T08:17:18Z)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning [67.93945726549289]
DeepTheorem is a comprehensive informal theorem-proving framework exploiting natural language to enhance mathematical reasoning.<n>DeepTheorem includes a large-scale benchmark dataset consisting of 121K high-quality IMO-level informal theorems and proofs.<n>We devise a novel reinforcement learning strategy (RL-Zero) explicitly tailored to informal theorem proving, leveraging the verified theorem variants to incentivize robust mathematical inference.
arXiv Detail & Related papers (2025-05-29T17:59:39Z)
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs [75.95179490687018]
Leveraging mathematical Large Language Models for proof generation is a fundamental topic in LLMs research.<n>We argue that the ability of current LLMs to prove statements largely depends on whether they have encountered the relevant proof process during training.<n>Inspired by the pedagogical method of "proof by counterexamples" commonly used in human mathematics education, our work aims to enhance LLMs' ability to conduct mathematical reasoning and proof through counterexamples.
arXiv Detail & Related papers (2025-02-12T02:01:10Z)
Proof Automation with Large Language Models [6.587933406842906]
Large Language Models (LLMs) have shown promise in automatically generating informal proofs in natural language. We propose PALM, a novel generate-then-repair approach that first prompts an LLM to generate an initial proof and then leverages targeted symbolic methods to iteratively repair low-level problems. Our results show that PALM significantly outperforms other state-of-the-art approaches, successfully proving 76.6% to 180.4% more theorems.
arXiv Detail & Related papers (2024-09-22T00:19:27Z)
Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z)
ATG: Benchmarking Automated Theorem Generation for Generative Language Models [83.93978859348313]
Humans can develop new theorems to explore broader and more complex mathematical results. Current generative language models (LMs) have achieved significant improvement in automatically proving theorems. This paper proposes an Automated Theorem Generation benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems.
arXiv Detail & Related papers (2024-05-05T02:06:37Z)
REFACTOR: Learning to Extract Theorems from Proofs [29.44286369265644]
We show that REFACTOR can extract 19.6% of the theorems that humans would use to write the proofs. With newly extracted theorems, we show that the existing MetaMath database can beed. We also demonstrate that the prover trained on the new-theoremed dataset proves more test theorems and outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-26T21:21:30Z)
LEGO-Prover: Neural Theorem Proving with Growing Libraries [86.1191481712352]
We present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%.
arXiv Detail & Related papers (2023-10-01T12:47:59Z)
Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.